Talk:Luus–Jaakola

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

I have trouble editing today, and presumably for a few days. I note that the Nelder-Meade simplex algorithm (sic., since it is a heuristic per Powell, 1973) has references on pattern search algorithms. I believe it has the first articles by Chinese mathematicians, which introduced positive basis theory, well before of Torczon, Lagarias, et al.  Kiefer.Wolfowitz  (Discussion) 19:27, 25 February 2011 (UTC)[reply]

Thanks for contributing to this article. It is clear that you are highly skilled within certain areas of classic optimization and numerical analysis.
I would like to suggest a few changes to this article.
First, on another talk page you raised the question of notability and whether to move the article to a sandbox or delete it. LJ and its variants have been used in numerous applications and publications. This makes it notable and suitable for inclusion in an encyclopedia. I also seem to recall that it is in encyclopedias for (meta)heuristics, such as handbooks for evolutionary computation, etc. I could check my archive/library but it isn't necessary as its use in publications already merits inclusion in Wikipedia. Whether one personally likes or dislikes the method is of no importance.
Second, the introduction of the LJ article now focuses very much on convergence aspects and advanced mathematical analysis using terminology that only mathematical experts will understand. The introduction focuses very much on what LJ cannot do while its strengths are given little mentioning; almost a footnote. The Nair paper is now cited 4 times, which also suggests the article has become about convergence of LJ rather than the method itself. It refers to the method belonging to chemical engineering, which was merely its first area of application, and is rather confusing I think. I suggest reinstating the original introduction and moving the discussion about convergence to the aptly named section. Also be careful not to introduce WP:OR. But I will again say that your skill in that area is evident and appreciated.
Third, you have removed references because they are not published in journals, apparently. Journal publication is not a requirement for inclusion in a Wikipedia article. Indeed, it is not even a requirement for the main source of a Wikipedia article, see e.g. particle swarm optimization whose main sources from the mid-1990's are conference papers.
Optimering (talk) 08:54, 1 March 2011 (UTC)[reply]

context is meaningless[edit]

In an edit summary, Optimering states that this article is meaningless without the modification to the q factor. This is puzzling, because Luus and Jaakola seemed to be able to get it published without that modification - so it must have held some meaning to the American Institute of Chemical Engineers Journal. - MrOllie (talk) 12:06, 28 February 2011 (UTC)[reply]

You removed the entire section about setting the parameter q. Without this parameter the algorithm, and hence the article, is meaningless. The reference you seem determined to remove is from a verifiable third party, the text contains no opinions or OR, and no weasel words. It just states a mathematical relationship between variables and hence cannot be biased or COI. If it was written in the article without a source it would be OR. So this is all plain WP:HARASS on your part and will be reverted. Optimering (talk) 09:14, 1 March 2011 (UTC)[reply]
I see what you're talking about now. Try to be more clear in the future (and better yet, use the talk page instead of edit summaries). I have corrected the article to use LJ's original constant. That should remain, because this is an article about the Luus-Jaakola method, not Pedersen's variant of it. (See WP:UNDUE.) - MrOllie (talk) 13:14, 1 March 2011 (UTC)[reply]
Just as COI did not apply, neither does UNDUE for it is not a minority viewpoint; it is a neutral and purely mathematical extension to the original method from a verifiable and reliable source(s). Indeed, the source(s) have been cited several times according to Google Scholar despite their quite recent publication. There is nothing controversial here. Following your reasoning particle swarm optimization would only cite the founding paper (out of almost 12.000 now in existence), airplane would only mention the Wright brothers, and Americas would only mention Columbus. I should also add that the issue of convergence is actually highly controversial - I've heard that Professor Luus completely disregarded the Nair paper. But now the article and introduction gives most weight to convergence issues, yet you have not made any edits/reverts to that. So it is again clear that you have specifically targeted me (optimering) and certain references I post. This is the definition of harassment. Optimering (talk) 13:33, 2 March 2011 (UTC)[reply]
That does not follow my reasoning at all. To extend my reasoning to airplanes, I would be against your addition of details of the Boeing 747 to the article Wright Glider. You correctly redirected Local unimodal sampling here because you seemed to recognize that Pedersen's variant lacked notability. Please don't slide back now. - MrOllie (talk) 19:08, 2 March 2011 (UTC)[reply]
Have you ever used LJ or LUS? Do you know anyone who has? Have you read any of the papers on them? Are you generally familiar with the body of knowledge in that field? Apparently not.
A gentle soul reminded me of the futility of arguing with certain editors, as humorously described in e.g. WP:CHEESE
Your edits will be reverted once I work out the rest of the article with the other editor. If you persist in your disruptive editing and harassment I will seek to get you banned from editing these articles. Optimering (talk) 07:15, 3 March 2011 (UTC)[reply]
Look, you need to get consensus to keep your preferred version of things on the page. Insults, threats, and statements that you will revert war without discussion are not going to help you get consensus. If you think you have a good case to have me banned, the proper place to raise it is WP:ANI. I look forward to seeing your case. - MrOllie (talk) 16:08, 3 March 2011 (UTC)[reply]
Dear Optimering, please look at the editing history, especially in the last days, and see that MrOllie and I have acknowledged misunderstandings, of the q factor and the random variable (respectively), and corrected the error, once you pointed them out to us.
You are out of line with the personal attacks and threats, as others have told you before.  Kiefer.Wolfowitz  (Discussion) 22:51, 3 March 2011 (UTC)[reply]
Reverting the article back to your version would be tendentious editing, in the WP meaning I linked before. Consider your first sentence, which I paraphrase: "In computer science, LJ is an iterative method ...." This was a defective lead sentence: The article fails to cite any computer science articles on LJ, which isn't an iterative method. Please accept the help that has been given.  Kiefer.Wolfowitz  (Discussion) 22:58, 3 March 2011 (UTC)[reply]
MrOllie, this is not the first time that you filibust. You clearly have no insight on the subject matter and a distorted understanding of WP rules. Whenever I overturn one of your unsound arguments you come up with another. I thought it would only be fair to warn you before I escalated the dispute to the administrative level. I take your reply as a confirmation that you are determined to keep enforcing Your point of view. Optimering (talk) 11:58, 4 March 2011 (UTC)[reply]

Is Luus-Jaakola stochastic?[edit]

I don't see any stochastic element in the description of LJ by Nair. Optimering, can you provide another source for the method? I cannot access to the chemical engineering journal you cite. I can access JOTA.  Kiefer.Wolfowitz  (Discussion) 21:26, 28 February 2011 (UTC)[reply]

The random sample is taken from a uniform distribution. Let me suggest browsing Google Scholar if you need other sources.
I seem to recall you have mentioned somewhere else that (meta)heuristics are stochastic as a defining characteristic.
Your statement is a distraction, and your memory (or mine) is incorrect. I have already mentioned Karp, whose one-tree approximation for the TSP is non-stochastic; Papadimitriou has a section on heuristics in his book with Steiglitz, etc.  Kiefer.Wolfowitz  (Discussion) 16:20, 1 March 2011 (UTC)[reply]
Your article has stated that LJ is stochastic. Is LJ stochastic' or is it not'?  Kiefer.Wolfowitz  (Discussion) 16:20, 1 March 2011 (UTC)[reply]
That is not required though. The defining characteristic is that they do not use gradients. What is the difference then between heuristic and metaheuristic - good question, the terminology is confusing in this field. In fact, LJ was originally called a 'direct search' method. Optimering (talk) 09:00, 1 March 2011 (UTC)[reply]
Optimering, please specify sources of attributions, especially secondary sources. Who stated (LJ or a secondary sourcer) stated that LJ is a "direct search" heuristic? What theorem do they state about its convergence?  Kiefer.Wolfowitz  (Discussion) 16:20, 1 March 2011 (UTC)[reply]
LJ uses a stochastic variable which is also mentioned in the Nair paper. I'm not sure what you're interested in knowing and why, but you may want to obtain the original paper, especially since you're making large edits to the Wikipedia article. Optimering (talk) 13:04, 2 March 2011 (UTC)[reply]
Step 2 draws from the U[-.5,0.5]. The convergence analysis doesn't have any probabilistic content, however, which through me off.
BTW, the sampling method is slothful: Expect 1/million points inside the sphere in dimension 10. G. W. Brown suggested generating MVN vectors and scaling them to have norm one, in the fifties, and his result is sketched in Knuth, volume 2.  Kiefer.Wolfowitz  (Discussion) 15:28, 2 March 2011 (UTC)[reply]
No, it is not that inefficient because the sampling range is being decreased. That is the whole point. I see you have also rewritten the 'motivation' section, which used to explain this quite well. If the parameter q is selected properly it can also handle rather large dimensions in few iterations. ('Few iterations' for a metaheuristic, that is.)
Sampling from the smallest box enclosing a solid ball (as in Nair's description) is inefficient in dimensions above 4. Look at "concentration of measure", which should have links to Keith Ball's lecture notes or Jiri Matousek's book, both of which explain the facts of life in beautiful fashion.  Kiefer.Wolfowitz  (Discussion) 18:22, 3 March 2011 (UTC)[reply]
You apparently haven't seen the comments I made on the top of this talk-page. There are two serious problems with the article as it now stands: 1) The introduction reads like an analysis of what LJ is not rather than what it is. It is written for someone with your particular mathematical background and not for a general audience. It also refers to a mathematical article which does not exist and it appears to form conclusions that are WP:OR. 2) The convergence section contains unsourced material (Nemirovski-Judin) as well as forming conclusions that appear to be WP:OR. (I trust you will note the irony of this in light of your rather vile attack on me, which led to all this.)
Citing Nemirovskii-Judin on the intractibility of non-convex unimodal minimization is original research? Maybe you are not a member of the optimization profession: Perhpas you might join SIAM/Opt or MOS (formerly MPS) and read something.  Kiefer.Wolfowitz  (Discussion) 18:25, 3 March 2011 (UTC)[reply]
Nemiroskii proved that any method will be bad on such problems. Insofar as the LJ heuristic is a method, it will be bad on such problems.
Let me get this straight. You want me to source a statement that Newton's method is the preferred method for C^2 functions???? (The set of C^2 functions was the (only) class of functions on which this heuristic has a converence result, in Nair's article)  Kiefer.Wolfowitz  (Discussion) 18:30, 3 March 2011 (UTC)[reply]
Please fix this or simply revert the article to what I wrote, which was perfectly fine. I know several people who have implemented LJ (well, LUS), from having just read the Wikipedia article. That is not possible as the article currently stands.
Optimering (talk) 07:09, 3 March 2011 (UTC)[reply]
To say that it is "perfectly fine" exaggerates its state. Wikipedia articles must provide some context. Perhaps the introduction provides too much context, and you are welcome to trim and polish, of course. However, reverting back to its previous state would seem to remove all traces of context. Cannot you try to find a middle ground?  Kiefer.Wolfowitz  (Discussion) 20:28, 3 March 2011 (UTC)[reply]
RE 'It also refers to a mathematical article which does not exist' I assume you meant the redlink attached to 'locally Lipschitz'. I have fixed the link to point to Lipschitz continuity. - MrOllie (talk) 16:14, 3 March 2011 (UTC)[reply]
Thanks for the help!  Kiefer.Wolfowitz  (Discussion) 18:30, 3 March 2011 (UTC)[reply]

What LJ is not[edit]

Defining Wikipedia, the WP guidelines have a long statement about what Wikipedia is not. Is this erroneous? (rhetorical question)

Distinguishing concepts from neighboring concepts is helpful, particularly when such distinctions help readers understand what LJ is. In practice, LJ is not an algorithm nor it is an iterative method, but rather it is a heuristic.

In theory, LJ is an iterative method on C^2 functions, for which Newton's method is the standard method.

LJ obviously is not competitive for local minimization for locally Lipschitz functions, because it doesn't use the Clarke subdifferential (or fancier subdifferentials). I may have referred you to recent papers by Overton, Kiwiel, earlier.

LJ may well be a heuristic for global optimization (in low dimensions). Does Optimering know of any papers that describe a class of functions on which it may be applied (and be competitive with standard approaches)?  Kiefer.Wolfowitz  (Discussion) 18:48, 3 March 2011 (UTC)[reply]

This is a rather bizarre dispute and I'm getting tired of it so I hope this will be the end of it:
* LJ and its variants are often used for problems that are not differentiable. You are correct that (Quasi)-Newton methods etc. are typically used otherwise. I believe I had written that in my original introduction. The simple 'Sphere' problem is just used for demonstration and analysis purposes in the literature.
* Whether LJ is competitive with other approaches is completely irrelevant unless you have a reliable and verifiable source that says so. What is relevant for it to be a Wikipedia article is that it is notable and the sources are verifiable and reliable, that the article is unbiased and gives due weight to differing points of view. That's all. If you have doubts about notability I suggest you check with Google Scholar. Whether you personally like or dislike metaheuristics in general or LJ in specific is completely irrelevant.
* You are simply incorrect that LJ is not an iterative method. That is precisely what it is, because it iteratively tries to refine a candidate solution but it is not guaranteed that it eventually succeeds. You may want to consult a dictionary if you do not understand the terms 'iterative' and 'method', both of which derive from Latin.
* Your performance analysis is incorrect because you neglect to consider the decreasing sampling-range, which is the entire idea with the LJ method.
* Your knowledge is based on the convergence paper by Nair, which I hear has actually been dismissed by several researchers. You will note that I basically just mentioned its existence in the Wikipedia article so as to give fair and due coverage to differing points of view.
* You have made unsourced references to Judin and Nemirovskii. You also use it to analyze LJ and make up your own conclusions. Unless you have a paper that uses their work to analyze LJ then it is WP:OR and must not be included in the article. It may be trivial for you but it is not trivial for a general audience.
* Your argument that LJ only works for low dimensionality is incorrect; varying and high dimensionality is what Pedersen's extension to LJ deals with. Unless you have a reliable and verifiable source that says otherwise, your statement is WP:OR and must not be included.
You stated that the method is suited for high dimensional problems. This claim goes against the basic results of global optimization (Yudin and Nemirovsky) and against Nair's own analysis of C^2 functions.
* LJ is not a method specifically for chemical engineering, that was just one area of application because it happens to be Prof. Luus' field. The best and most generally applicable categorization I can offer is that it is an 'iterative method' in 'computer science', a 'metaheuristic', as I had originally written in the introduction.
* It is not very informative to make a whole list of what the subject of a Wikipedia article is not, rather than what it is - especially in an introduction. I originally made a link to metaheuristic in the introduction which then explains what a metaheuristic is and is not, and I also briefly touched upon differentiability, quasi-newton methods, etc. in the introduction. This provided ample context.
* The terminology you use is very mathematical, some of it rather advanced. It is a Wikipedia article in computer science and not mathematics so it should use suitable terms and language.
If you have no deep knowledge of metaheuristics in general, no experience and expertise with LJ specifically, and if you haven't even read the original paper on LJ, then why are you even editing this article? You and MrOllie have managed to destroy this article and waste my time. I'm not a psychologist so I honestly don't know why, but I have come to the end of what I will tolerate. Wikipedia is not a message board nor is it a school where you can 'Ask The Professor.' If you have no competence on a given subject you should not edit the article.
Optimering (talk) 11:47, 4 March 2011 (UTC)[reply]
You've brought this up several times now, so I want to make sure that you know that the Wikipedia community rejects this idea of yours that you must be an expert to edit an article. If you feel that sort of rule is necessary, you might be happier at an encyclopedia project that grants special authority to subject experts, such as citizendium.org. - MrOllie (talk) 15:45, 4 March 2011 (UTC)[reply]

Iterative methods[edit]

Read Knuth for the definition of iterative methods, which is used in optimization, computer science, scientific computing, etc. Computational scientists think that the distinction between heuristics and iterative methods is important, as is the distinction between algorithms and iterative methods. Again, ask for an opinion at the computer science project if you think that these distinctions should be abandoned.  Kiefer.Wolfowitz  (Discussion) 14:40, 4 March 2011 (UTC)[reply]

I added the requested reference to Yudin & Nemirovsky, which explains the painful growth in complexity for unimodal functions that are not convex, which should be basic knowledge to anybody working in this area.
You have declared your expertise many times, but you should recognize that you are citing papers that have not appeared in the better optimization journals, which (following Wolfe's satire on such "methods") require ideas, computational testing (against the best competitors), or a convergence analysis. I remain unimpressed by the editing or by the pouting on talk pages.  Kiefer.Wolfowitz  (Discussion) 16:11, 4 March 2011 (UTC)[reply]
The article leaves me somewhat puzzled. This is supposedly a method for use on functions for which the derivatives are not available. Then there's a convergence analysis for cases where the derivatives are available. This looks like an obvious method for stochastic optimization in a high-dimensional space - pick a point, try points in some random direction at some distance, and if nothing looks better, try again with a shorter distance. Been there, done that. Also, it looks like "d" always decreases, so if this thing gets into a bumpy area and "d" is reduced, it never gets bigger again, even if the search moves into less hostile terrain. Is that right? --John Nagle (talk) 22:56, 4 March 2011 (UTC)[reply]
It is difficult to find references to this heuristic in optimization or numerical analysis or computer science journals, although it seems popular among chemical engineers. I mentioned on the COIN noticeboard that Rinnooy Kan and Tinter ignored this heuristic in their handbook of optimization article on global optimization.
Above, Optimering states that Luus downplays issues of convergence. There a number of similar articles without convergence results on Wikipedia---see no free lunch theorem for a particularly bizarre assertion (which fails to specify the probability measure on the class of all functions over which methods are averaged!). Perhaps a fire-alarm should be made to the mathematics and computer science wikiprojects?  Kiefer.Wolfowitz  (Discussion) 13:54, 5 March 2011 (UTC)[reply]

Compare to Implicit Filtering[edit]

(No time right now to write something complete.) Would be good to compare Luus-Jaakola to Kelley's Implicit Filtering. They both do some sampling and gradually reduce the search range.

Here are some links for Implicit Filtering:

https://epubs.siam.org/doi/book/10.1137/1.9781611971903 Software, Environments and Tools Implicit Filtering Author(s): C. T. Kelley 2011

https://projects.ncsu.edu/crsc//reports/ftp/pdf/crsc-tr02-28.pdf A Brief Introduction to Implicit Filtering C. T. Kelley Sep 14, 2002

https://ctk.math.ncsu.edu/iffco.html Implicit Filtering (Group of people working in this area.)

https://ctk.math.ncsu.edu/imfil.html Link to matlab code, book, papers

https://github.com/chvillap/PSIFA PSIFA (Pattern Search and Implicit Filtering Algorithm) is a derivative-free optimization algorithm developed by Ferreira, Ehrhardt and Santos (2017) that has been designed for linearly constrained problems with noise in the objective function. It combines some elements of the pattern search approach of Lewis and Torczon (2000) with ideas from the implicit filtering method of Kelley (2011).

https://pypi.org/project/SQImFil/ ImFil is intended for optimizing on derivative-free, noisy, blackbox functions. This modified version has preset defaults as intended for hybrid quantum-classical algorithms run on Noisy Intermediate Scale Quantum (NISQ) computers. This version of ImFil was modified and redistributed with permissi — Preceding unsigned comment added by Ericjster (talkcontribs) 22:38, 4 February 2021 (UTC)[reply]