Talk:Backtracking line search

Dispute over content added in September 2020

I like to bring here the dispute with user Potaman (from now on P for saving time) about them keeping deleting my edits. I hope that the community can help with resolving.

What I originally added was: 1. A section about some modifications of Backtracking Gradient Descent. 2. A section about bounds for the learning rates. 3. A section about implementation in Deep Neural Networks. 4. A section about theoretical guarantee. All I wrote have clear references so people can read. I added 3 new references, including 1 paper of mine, which is the first source containing some of what written in the above 4 sections. The purpose of my writing was to add more and useful information on this optimisation method. As I wrote in one talk page with Pon, this is not anything obvious: Google and then you can find claims that Backtracking GD cannot be implemented in large scale optimisation.

Then, P deleted all I wrote and chanted about "Self-citation" rule of Wikipedia. I have been all the time very polite and professional toward P. When I asked whether the rule prevents authors to cite their papers, P did not reply directly but keep deleting. (Later it was noted by user WikiDan61 that Wikipedia does allow authors to write about their results if that is necessary and brings new and useful knowledge, and I checked the Wikipedia myself.)

Then, P chanted about the papers included in Wikipedia must have "enough citations" and "generally accepted", I asked P precisely what these phrases mean and what rules of Wikipedia apply to them. I also gave P examples, such as the paper on Adam method, which attracted many citations but is very wrong. P did not reply but keep deleting and chanting about "self-citation", "enough citations" and "generally accepted".

In one talk page, P declared that P has no conflicts of interest in deleting what I wrote. Then later, in another talk page, P reveals that P happens to know about so called "quadratic and cubic interpolation backtracking", and accuses me of self promotion because I did not write about these "quadratic and cubic interpolation backtracking". I then write that I don't know about these things to write, and invited P to write if pleased - including enough theoretical and experimental results so people can see how good these method are. P then reveals P is an engineer, and added a link to a paper on that. (So it is very likely that P has a conflict of interest.) I replied that I don't have responsibility to do so, this is Wikipedia, people can write parts which they feel competent.

P also complained about cannot find my paper in the top ten when finding the phrase "Backtracking line search" on GoogleScholar. To satisfy P's occupation with rankings/citations, I suggested P to search the phrase "Backtracking gradient descent" instead. P has not yet informed what P found. I like to bring here the dispute with user Potaman (from now on P for saving time) about them keeping deleting my edits. I hope that the community can help with resolving.

What I originally added was: 1. A section about some modifications of Backtracking Gradient Descent. 2. A section about bounds for the learning rates. 3. A section about implementation in Deep Neural Networks. 4. A section about theoretical guarantee. All I wrote have clear references so people can read. I added 3 new references, including 1 paper of mine, which is the first source containing some of what written in the above 4 sections. The purpose of my writing was to add more and useful information on this optimisation method. As I wrote in one talk page with Pon, this is not anything obvious: Google and then you can find claims that Backtracking GD cannot be implemented in large scale optimisation.

Then, P deleted all I wrote and chanted about "Self-citation" rule of Wikipedia. I have been all the time very polite and professional toward P. When I asked whether the rule prevents authors to cite their papers, P did not reply directly but keep deleting. (Later it was noted by user WikiDan61 that Wikipedia does allow authors to write about their results if that is necessary and brings new and useful knowledge, and I checked the Wikipedia myself.)

Then, P chanted about the papers included in Wikipedia must have "enough citations" and "generally accepted", I asked P precisely what these phrases mean and what rules of Wikipedia apply to them. I also gave P examples, such as the paper on Adam method, which attracted many citations but is very wrong. P did not reply but keep deleting and chanting about "self-citation", "enough citations" and "generally accepted".

In one talk page, P declared that P has no conflicts of interest in deleting what I wrote. Then later, in another talk page, P reveals that P happens to know about so called "quadratic and cubic interpolation backtracking", and accuses me of self promotion because I did not write about these "quadratic and cubic interpolation backtracking". I then write that I don't know about these things to write, and invited P to write if pleased - including enough theoretical and experimental results so people can see how good these method are. P then reveals P is an engineer, and added a link to a paper on that. (So it is very likely that P has a conflict of interest.) I replied that I don't have responsibility to do so, this is Wikipedia, people can write parts which they feel competent.

User Eggishorn said that we should cite papers and so on. Which is what I did at beginning, but reversed to satisfy this P. User Maproom then writes that I need to cite "secondary" sources to be "Wikipedia worthy". I am confused about why not citing original source but secondary source.

I now have a shorter version while discussing here.

Now here is what I think:

- First, even my first versions are very neutral in tone. Later, when I removed my paper from the Reference list, it becomes more neutral.

- P is confused between what P thinks one cannot write on Wikipedia and what Wikipedia's rules actually allow.

- P is trying to obstruct my contribution to spreading new and useful knowledge, relying on rules which P poses themselves.

- I want to know what rules of Wikipedia are pertaining to what Maproom mentioned about "Wikipedia worthy".

- Now concerning the content:

Section about some modifications: They are modifications and clearly written and explained and no outside source needed for a reader's understanding and applying. Also, clearly these are new information and as example given shown, these versions are better than the original ones, at least for those examples. (Also, if one wants to dig more deeply, there are source codes online.)

Section about bounds on learning rates: What mentioned here are mathematical theorems. And proofs of mathematical theorems are either right or wrong, no matter "number of citations", "generally accepted" and so on.

Section about Implementation in Deep Neural Networks: This pertains experiments, and source codes are out there, so one can readily check if one likes. We have checked these results hundreds of time, and there is also another group in France who did experiments and inform us of similar results, and as well another different group mentioned in my writing (from Canada, to be precise) who announced similar experimental results. I invited P if in doubt can run self.

Section about Theoretical guarantee: again, this section is about mathematical theorems, and proofs of them are either right or wrong, no matter what P thinks.

I presented all the facts. Now I look forward to comments/ideas from the community. Thanks. Tuyentruongoslo (talk) 07:44, 25 September 2020 (UTC)Tuyentruongoslo[reply]

P also complained about cannot find my paper in the top ten when finding the phrase "Backtracking line search" on GoogleScholar. To satisfy P's occupation with rankings/citations, I suggested P to search the phrase "Backtracking gradient descent" instead. P has not yet informed what P found. Tuyentruongoslo (talk) 07:51, 25 September 2020 (UTC)[reply]

As explained in the Wikipedia link sent by user Maproom to me, the so-called secondary source applies only to events. In the case of this Wikipedia article, I am using peer reviewed papers, which are considered as "Reliable sources" and are listed as the first among trusted sources for Wikipedia articles. So I don't understand why P keeps chanting about "uncited" things, even just before I write these lines.

P: What rules of Wikipedia which you based on to delete my edits, even the most recent one? — Preceding unsigned comment added by Tuyentruongoslo (talk • contribs) 16:10, 25 September 2020 (UTC)[reply]

There is no citation for any of the things you added. How is someone to check for them? I find it very wrong that you're using a public resource to push your work. A lot of mathematical results are published every-year -and they are all not immediately added to wikipedia EVEN IF THEY ARE A 100% correct. I find that going on to adding very current research which does not seem to have general acceptance - (Citations, being part of a commercially released/widely used toolkit) is not the right thing to do in an encyclopedia, especially when the previously accepted page doesn't even describe the most commonly implemented backtracking algorithm fully. It doesn't have the proof of convergence from Nocedal and Wright or from Denis and Schnabel. You could spend your time making those edits - instead you're trying very hard to promote your own work. Potaman (talk) 19:18, 25 September 2020 (UTC)[reply]

Potaman: First of, the last version has many of citations of papers and books published several years ago, with a lot of citations, as I warned you in the History page. For example, I wrote "For example, if the cost function is real analytic, then it is shown in the paper by Absil et al that convergence is guaranteed." So, you did not read any of my writings, and just undo whatever. You are lying that there are no citations and caught!

Second, I explained above this is not to push my work, but to introduce new and useful algorithms/results. It includes from many papers.

Third, whether you feel that I am wrong or not, that does not matter. The thing that is matter is what I am doing here is legal by Wikipedia's rule, while what you are doing (deleting my writing without proper rules from Wikipedia allows you to do so) is illegal. Tuyentruongoslo (talk) 20:41, 25 September 2020 (UTC)[reply]

Potaman: Concerning "proof": What convergence proof in those two books which you mentioned? Just write it clearly here and then I will answer.

Concerning implementation: Be kind to show me a source code of your favourite version of Backtracking Gradient Descent which works well for Deep Neural Networks (for example, CIFAR?) which appeared before our GitHub source codes here:

https://github.com/hank-nguyen/MBT-optimizer

?Tuyentruongoslo (talk) 21:05, 25 September 2020 (UTC)[reply]

— Tuyentruongoslo (talk • contribs) has made few or no other edits outside this topic.

There is also some recent relevant discussion at Wikipedia:Editor assistance/Requests#Request resolving dispute with user Potaman considering my edit on the page "Backtracking line search".

Wikipedia is intended to summarize well-known, widely accepted information about each of the topics it discusses, not to introduce new research. It is not intended as a place "to introduce new and useful algorithms/results". This article should contain content that is basically similar to a chapter in an average introductory textbook that discusses the topic of backtracking line search. It should be similar to a Chapter 1 of such a textbook, not the Chapter 17 at the back where a professor might talk about their favorite specialised related niche. If the average professor teaching an introductory course who has been assigned to teach some students about backtracking line search would not discuss some particular aspect (e.g., the application of backtracking line search to deep neural networks and the comparative precision of something called "CIFAR10 on Resnet18", whatever that is, which is clearly something that most readers will not know about), then that should not be discussed in the Wikipedia article.

A Wikipedia article should also not contain something like a "NOTE TO OTHER EDITORS:" that talks about an "ongoing dispute with another user". Discussions with other editors belong on article Talk pages like this one, not within the content of the article that is presented to readers who are not interested in editing the article.

Starting at the top of the article, I noticed a sentence that you added that said "Another name for this algorithm is Backtracking Gradient Descent." When I search the web for the phrase "Backtracking Gradient Descent" using Google Advanced Search, I am surprised to find that very few sources seem to use that particular term. In fact, essentially all of the sources that I find that use that exact phrase seem to be published by "TT Truong" or "Truong, et al". Since that does not seem to be a term that is widely used in the literature, it should not be in the Wikipedia article.

Some relevant Wikipedia guidelines and policies are found at WP:COI.

Please do not use capital letters at the beginning of each word in multi-word terms like "deep neural networks" and "stochastic gradient descent" and "backtracking gradient descent". That is contrary to Wikipedia style. Some relevant guidelines are at MOS:CAPS.

—BarrelProof (talk) 14:22, 26 September 2020 (UTC)[reply]

BarrelProof Thank you. For "Deep Neural Networks" that is the common way people write in the literature. I don't see anyone writes "deep neural networks". Now, except CIFAR10 and Resnet18 (which a general people would not know, but a person working in Deep Learning, who would be the most interested in this stuff will know), the other things I wrote in this article can be taught in an introductory class as you wish. Tuyentruongoslo (talk) 16:00, 26 September 2020 (UTC)[reply]

I suggest to have a look at the article on Wikipedia with the title "Deep learning". It is also the target of the phrase "deep neural network" on Wikipedia. Please notice that neither of those phrases are capitalized in that article. More information about that aspect of Wikipedia style can be found at MOS:CAPS. —BarrelProof (talk) 05:07, 28 September 2020 (UTC)[reply]

WikiDan61 As I wrote in the history page, the 4 papers and books in the reference list now are sourced and cited by many. So I think the current text I write in the Wikipedia article now, which use only these, satisfy whichever requirements by Wikipedia.

Of course, it is true that results published can be later found to be not correct. However, most of what I wrote are really easy to check (like 1+1=2, but in the context of Optimisation Tuyentruongoslo (talk) 16:23, 26 September 2020 (UTC)), so Potoman can check themselves (by pen and paper for most of what I wrote, or by a personal computer with python in some minutes for most of the others, except CIFAR10 and Resnet18 which requires access to GPU - but this could be easily achieved for many companies). As I mentioned in discussion with Maproom, if Potoman found something incorrect, then P can argue from that.[reply]

Considering reliable sources, which you linked, I found the following: "Material such as an article, book, monograph, or research paper that has been vetted by the scholarly community is regarded as reliable, where the material has been published in reputable peer-reviewed sources or by well-regarded academic presses."

Moreover, the one about CIFAR10 and Resnet18, as I wrote, have been checked by at least 2 groups, both are published papers. Would that be enough for you?

One more question: Is there any deadline for resolving disputes? What happens if Potoman does not participate anymore in this Talk page? Tuyentruongoslo (talk) 16:16, 26 September 2020 (UTC)[reply]

@Tuyentruongoslo: If you are relying on prior published work for your additions, it would be helpful to cite your sources inline for the material added. This article is already problematic in that it does not properly use inline citations for its material; you should not exacerbate the problem by adding significantly more material that is not properly cited. You do not have the responsibility to fix the text that existed prior, but you should properly cite the material the you wish to add. WikiDan61^ChatMe!_ReadMe!! 15:04, 27 September 2020 (UTC)[reply]

WikiDan61 If you read what I wrote which you just deleted the very last time, you saw that I cited a lot in line. For example, the "Lower bounds for learning rates" I wrote "at least since Armijo's paper". Or the one about "Algorithm in practice" I wrote "Bertsekas' book". I don't know why do you claimed I did not cite? Maybe just one or two sentences I did not cite inline, but why do you delete the whole thing? — Preceding unsigned comment added by Tuyentruongoslo (talk • contribs) 15:25, 27 September 2020 (UTC)[reply]

As long as you realize that Wikipedia is not a place to push your research as your original edits were doing and even the latest one are (backtracking gradient descent is a term only you use) - have at it. I have better things to do. If you are unaware of the standard implementation (Alg. 6.3.1 in Denis and Schnabel), implemented in PETSc, Linesearches.jl, Scipy etc which have 1000s of users, I can't help you. I don't have an algorithm of mine to push. Most users of Wikipedia are going to come to back-tracking linesearch through these algorithms, not through specialized libraries for Deep Neural Networks, or for Stochastic Gradient Descent,so the question of whether your algorithm performs better on CIFAR10 or Resnet18 doesn't really help, even if it is better. For people who come to this article from these very very common implementations - your edits make no sense, and are not helpful. chapter 1 to chapter 17 as BarrelProof mentioned, especially since the LineSearch page itself is not particularly well filled. Again - the point of edits on wikipedia is not about "I am correct - therefore it gets on wikipedia", Its does it help people learn about this - Your edits don't help people learn. Your original intent was to push your research - now you're just trying to have the last word. Go ahead have the last word, I have better things to do. https://github.com/scipy/scipy/blob/master/scipy/optimize/linesearch.py https://github.com/JuliaNLSolvers/LineSearches.jl/blob/master/src/backtracking.jl https://www.mcs.anl.gov/petsc/petsc-current/src/snes/linesearch/impls/bt/linesearchbt.c.html#SNESLINESEARCHBT Potaman (talk) 17:37, 26 September 2020 (UTC)[reply]

Potaman

I realise that Wikipedia is the place where people can learn everything which are useful, either basic or highly special. So if someone has some useful knowledge to help people, and does not violate any of Wikipedia's rules, then one has the right to edit here. Folks who disagree must present valid reasons, based solid on Wikipedia's rules. My realisation does not need to be the same as yours, and I respect your rights and opinions, but I request you to conform with Wikipedia's rules. This is not your own or your friend's blog. Everyone has equal rights here.

First off, since you seem not knowing about Deep Neural Networks, which run things on your cellphones (such as Siri assistant), let me explain a bit. You need to run these, and they are very expensive to run. Hence, if you have a better optimisation, that helps less consumption of energy, and hence is good for many things (energy, environment, money and so on). This is my purpose when writing this, and not to pushing my results.

Second, I don't know why you are pretty sure about most people coming here not looking for methods for running Deep Neural Networks effectively? Source, please.

Third, I am writing things from what I know, that is optimisation methods in large scale, with applications in Deep Neural Networks. If you like the other algorithms, then feel free to write. So of course, my edits don't help the people which you mentioned, but I am sure will help other people. So that is enough for me, and I think that is the purpose of Wikipedia. If you like to help a special community of people, then feel free to write in. I don't have the responsibility to make everyones happy.

Fourth: The version in Algorithm in this article is the standard definition of this method. This method is very simple to implement, so I can write codes myself, don't need the ones which you mentioned.

Fifth: again, source please for your sentence "Your edits don't help people learn"? You can google and see that many people have trouble with understanding this. Anyway, your own opinion does not give you the right to delete my edits.

Last: Now coming back to the main point of this Talk page. Do you have any valid Wikipedia's rules to delete my edits? Do you have objections with my citing my paper, if so, please give valid Wikipedia's rules?

Potaman By the way, you speculate too much about my intention. It's up to you, as I wrote, you have your own opinion, I have my opinion, to me it does not matter too much.

Anyway, after reading again your reply, I understand that you don't have any objection now. Please confirm. Tuyentruongoslo (talk) 18:58, 26 September 2020 (UTC)[reply]

Tuyentruongoslo (talk) 18:11, 26 September 2020 (UTC)[reply]

By the way - every third party in this dispute has agreed with me. When you add stuff that is three times the length of the original article entirely based on your own published research : THAT IS CLEAR SELF-PROMOTION. VERY VERY VERY CLEAR SELF PROMOTION. I know what all those things you mention are - you seem to have no idea of anybody outside your world. Your additions to the article are completely opaque to anybody who hasn't taken a class in Analysis. While it's nice to imagine that everybody reading this article has read an analysis text - it is not true. Again you may not need to implement the algorithm, but people are using those algorithms - they are going to see the code and want a better understanding of what's going on. Wikipedia is also for those people.

Lets consider this sentence A lower bound for learning rates satisfying Armijo's condition is in the order of 1 / L ( x ) {\displaystyle 1/L(x)\,} {\displaystyle 1/L(x)\,}, where L ( x ) {\displaystyle L(x)\,} {\displaystyle L(x)\,} is a local Lipschitz constant for the gradient ∇ f {\displaystyle \nabla f\,} {\displaystyle \nabla f\,} near the point x. This is known at least since the paper by Armijo.

Do you really think that this sentence is clear to anyone who doesn't have a background in analysis? — Preceding unsigned comment added by Potaman (talk • contribs) 19:16, 26 September 2020 (UTC)[reply]

Potaman The other people talks about Wikipedia's rules , and I checked Wikipedia's rules and I don't violate anything.

You can keep chanting "Self-promotion", as you did before, and you were caught lying. It's up to you. It is just your own opinion. [And to help make this Talk page complete, I repeat here things that I wrote to you at other talk pages, with a bit addition: An article in Wikipedia is a collaboration of many editors. So don't expect and accuse anyone if they don't write a thing you like to see. If you know things, then why not write yourself, to help Wikipedia and people? Now, I only write things that I am competent, and I also need time to gradually write. The things that I wrote in the first versions, and that you repeatedly delete - without proper valid reasons, except chanting your own posed rules - are the ones which are useful (work well in experiments), theoretical guaranteed (there are mathematical theorems for them, which I did not mention much - maybe later if it is deemed necessary - to keep things easy to understand for an average reader) and I know first hand. Now, why I cited my papers, as I wrote before in some talk pages to you - is because my paper is the first one where these are first given. Tuyentruongoslo (talk) 02:28, 27 September 2020 (UTC)][reply]

Concerning the sentence you asked: you seem to say that a person comes here to look for Backtracking line search and does n't understand analysis, calculus? Are you kidding? Also, even that were the case, is your decision of deleting completely everything I wrote a reasonable decision?

What is your real purpose of deleting the posts?

I don't want to waste time with you. You write more, and I feel more that you lie more. Now just answer the questions to settle this dispute:

Do you still contest with me about me citing my paper? If not, then please confirm, and I go ahead. If yes, then please give valid Wikipedia's reasons, and then I will just reply to those reasons?

WikiDan61 Could you please confirm are there deadlines for dispute resolving? Tuyentruongoslo (talk) 20:05, 26 September 2020 (UTC)[reply]

P.S. I did not see your sentence about "people use the codes and come here to understand more, but don't know analysis" in the first reading. Here is my reply: if they really want to understand, then they need to study things, for example by looking at the references. Wikipedia is not a textbook, it gives some hints, but you need to understand some basics. What I added was for people who already knew about what was in this "Backtracking line search" article before I added. If they don't understand, then they can look at the references. There is a fine balance to how much you want to add in. For people who purely only want to use, then there are source codes in the papers and they can use. You can add links to other Wikipedia pages for help if you like. Tuyentruongoslo (talk) 20:49, 26 September 2020 (UTC)[reply]

@Tuyentruongoslo: You stated above that other people talks about Wikipedia's rules , and I checked Wikipedia's rules and I don't violate anything. One of the main rules at Wikipedia is the rule of consensus -- in any case of dispute, we much reach a point of consensus. In general, when multiple editors have come into the discussion and disagreed with your point, you have not yet reached consensus. Now, to answer your question: no, there is no deadline. WikiDan61^ChatMe!_ReadMe!! 15:04, 27 September 2020 (UTC)[reply]

WikiDan61 If you disagree with me, please let me know about what Wikipedia's rules you think I violated, and then I will answer you. Also, the last edits of me are with citations from other papers and books, not my paper, so why do you delete?

WikiDan61 The first sentence I see in the rule about consensus which you linked is this "In determining consensus, consider the quality of the arguments, the history of how they came about, the objections of those who disagree, and existing policies and guidelines. The quality of an argument is more important than whether it represents a minority or a majority view. The arguments "I just don't like it" and "I just like it" usually carry no weight whatsoever." Now, if you look at what Potaman's argument, he chanted "Self-promotion" and so on, which is the same as "I just don't like it". Concerning the deadline, do you mean that if Potoman keeps holding time by posting statements like "What you edit was not useful for people like this", then P can keep the page stalled? Is it that simple to delete every Wikipedia page? Now as I wrote, it is better, and I think I can request this, that P, or you, or whoever, who disagree, to post what rules of Wikipedia which I may violate, and then I will answer. Maybe the first post I did not know the rules, but from the shorter version (which P also deleted without reasons) until now, I am aware and don't violate. For example, see my answer to user BarrelProof below.

WikiDan61 In contrast to what you wrote "No, there is no deadline", I found this from the very link which you sent. I have the right now to ask whether you are bias in this case?

View three: Don't postpone dispute resolution Whether the addition/removal to the article can be justified or not, it is sometimes better to handle the dispute at the time it occurs. Generally referenced additions can be viewed and evaluated by other users more easily, since it is much easier than tracking the additions / removals from article history, and generally "let it go" cases are forgotten after a while, unless an editor bothers to check every single entry in article history. Also discussing cases after a while may consume much more time than early solved conflicts since non-solved conflicts generally turn out as personal conflicts between editors. Moreover, since editors try to edit in their free time where they can do anything else, they may not find such time in the future to edit or discuss these matters to improve Wikipedia. And it is frequent that some users act WP:POV or WP:BIASed (and WP:Systemic bias in the worst cases) because of their political or religious views or they may not have any expertise in the article they edit. From time to time they may have WP:COI, or act like they WP:OWN the article, they may take things personally and may not be WP:POLITE (verbally or worse with their editing style) so, whether or not you assume WP:GOODFAITH, you may not come to an agreement. At those times, you may seek third party review help from uninvolved editors to come to an agreement between both parties.

WikiDan61, also Potaman I think both of you violate this, from the same line "No, there is no deadline" above, in particular to all I wrote after the shorter version was put, when I first opened this talk page. Clearly, in here, we can see that Wikipedia encourages improvement, through cordial discussion, rather than quick deletion as you two both did, repeatedly.

View two: Don't rush to delete articles[edit] Shortcut WP:RUSHDELETE We can afford to take our time to improve articles, to wait before deleting a new article unless its potential significance cannot be established.

Wikipedia is not a paper encyclopedia and has no need to work towards a deadline. There is no finished version expected soon, and it is perfectly acceptable to let the editing process fashion an article up to our standards eventually. And if it takes a long time for that process to work, so what? Wikipedia is a work in progress, and will always remain so. There is no publication date and Wikipedia does not have to be finished today. It merely needs to have improved on yesterday. Perfection is neither desired nor achievable.

Remember also that consensus can change over time. New people may bring fresh ideas, established users may change their minds when new things come up, and we all may find a better way to do things.

Above all, the principle of creating an article which is unfinished was once a consequence of the now historical second rule of Wikipedia, Always leave something undone (though the present procedural policy no longer discusses this). By creating an unfinished article, you encourage other people to contribute; collaboration on articles will earn you far greater respect than solo editing.

BarrelProof Concerning your link about Conflict of Interests, here I copy the sentence which is relevant to this case: "Using material you have written or published is allowed within reason, but only if it is relevant, conforms to the content policies, including WP:SELFPUB, and is not excessive." I think I don't violate this.

BarrelProof Potaman WikiDan61 Since only 3 of you and I are til now in this dispute, this is not offend you, but to make sure that we all are treated fairly, can we all declare no conflict of interests, that is no of us are friends to one other? I can do first: I declare that I am not friend of any of you 3, before or now, as far as I can tell from looking to your Wikipedia's user accounts.

— Preceding unsigned comment added by Tuyentruongoslo (talk • contribs) 19:48, 27 September 2020 (UTC)[reply]

BarrelProof Potaman WikiDan61 Now that I have added a lot new material pertaining other people's papers and books, does any of you still object me citing my paper if relevant and not excessive? Just name Wikipedia's rules, and I will answer. Is it reasonable to ask for 1 week deadline for you to provide the answer? (I don't know why it should take longer for just writing like "violate this rule, that rule and that rule".) If after 1 week I don't see any reply to this, then I will assume that you all are OK and have no more objections. Tuyentruongoslo (talk) 08:38, 28 September 2020 (UTC)[reply]

I think the primary question is not just a matter of self-citation, COI or verifiability. It is a matter of whether the information being added is of sufficient interest for someone who is seeking an introduction to backtracking line search. This is a question known as WP:UNDUE (and possibly WP:NOTABILITY). There are also some significant problems of writing style as well. For example, the lead section of the article currently contains this phrase "such as Newton's method Newton's method (if the Hessian Hessian matrix is positive definite Definite symmetric matrix)". That is just a mess. The writing is not fluent and not following the Wikipedia style. —BarrelProof (talk) 16:55, 28 September 2020 (UTC)[reply]

I will reply to all your points below. Some of them have been replied before, in various talk pages.

Concerning COI, self-citation: I cite here from the relevant information from Wikipedia: "Using material you have written or published is allowed within reason, but only if it is relevant, conforms to the content policies, including WP:SELFPUB, and is not excessive." I think I don't violate.

Concerning reliable sources, verifiability: Here is from Wikipedia "Material such as an article, book, monograph, or research paper that has been vetted by the scholarly community is regarded as reliable, where the material has been published in reputable peer-reviewed sources or by well-regarded academic presses." I think it is OK. Besides, as before, I invite anyone who has doubts to run the source codes or check mathematical arguments to verify. Concerning experimental results, as I mentioned, there are also reports from 1 group in France (to me) and 1 other group in Canada (also published paper). Concerning citations: as I wrote to Potaman, one paper appearing later could get more citations than a paper appearing earlier, because of various reasons.

Concerning Undue: since this article is about Backtracking line search, and what I wrote are helpful modifications and usage of this method, I think I don't violate this. Copy from Wikipedia: "Wikipedia should not present a dispute as if a view held by a small minority is as significant as the majority view. Views that are held by a tiny minority should not be represented except in articles devoted to those views (such as Flat Earth)."

Concerning Notability: There are two things here. First, what I wrote is in the "Backtracking line search" Wikipedia article, and not an own Wikipedia article on my paper alone. I copy here from Wikipedia: "Notability guidelines do not apply to content within articles or lists Shortcuts WP:NNC WP:NLISTITEM WP:NOTEWORTHY The criteria applied to the creation or retention of an article are not the same as those applied to the content inside it. The notability guidelines do not apply to contents of articles or lists (with the exception of lists which restrict inclusion to notable items or people). Content coverage within a given article or list (i.e. whether something is noteworthy enough to be mentioned within the article or list) is governed by the principle of due weight and other content policies. For additional information about list articles, see Notability of lists and List selection criteria." Second, even if one does not take into account the first, what I wrote in the above points can cover this.

Concerning style: It is because I am not used yet to how to write on Wikipedia. This can be learned and improved. I copy Wikipedia again (from "View two: Don't rush to delete articles"): "Above all, the principle of creating an article which is unfinished was once a consequence of the now historical second rule of Wikipedia, Always leave something undone (though the present procedural policy no longer discusses this). By creating an unfinished article, you encourage other people to contribute; collaboration on articles will earn you far greater respect than solo editing. "

If you have other opinions, I am happy to reply.Tuyentruongoslo (talk) 11:28, 30 September 2020 (UTC)[reply]

Concerning your point about whether the material can be included in an introductory textbook: Yes. First, of all, the algorithms are really elementary to understand. It is like you can explain Fermat's Last Theorem to a high school student, even though probably not more than 50 people can have time or expertise to really understand Wiles' proof. You can compare the algorithms I included, and deleted, with say Adam, Adadelta and so on in Stochastic gradient descent. Second, for experimental results, I only announced the result, which people can really grasp, even if they don't understand deeply. This helps to inspire people. For example, take this sentence "Google's Alpha is a compute program which uses Deep Learning to play the game of Go, a very ancient ancient and has much more possible figurations than Chess, and has repeatedly beat best human players. Now, computers can learn Go from crash and become top of the world in a couple of days." Most people won't understand about Deep Learning and Go, but you can tell this to everyone (even small children), and they will understand that Deep Learning is very useful and that can help inspire them to learn about Deep Learning and/or Go. Similarly, this sentence, which I wrote and deleted, can be told to everyone: "Backtracking line search can be implemented in Deep Neural Networks, and there are at least two reports on that it works very well against popular algorithms such as Adam and Adadelta." It can inspire people to learn more about Backtracking line search. Tuyentruongoslo (talk) 11:42, 30 September 2020 (UTC)[reply]

Specific aspects of described algorithms

The article refers to a case "where $\mathbf {p} =\nabla f(\mathbf {x} )$ ". I think I see two problems with that. First, I think it has a sign error. If you want the function to decrease when you add $\alpha \,\mathbf {p}$ to $\mathbf {x}$ , assuming $\alpha >0$ , shouldn't you set $\mathbf {p} =-\nabla f(\mathbf {x} )$ ? Second, the article says that $\mathbf {p}$ should be a unit vector, but $\nabla f(\mathbf {x} )$ is not a unit vector, is it? In fact, shouldn't its magnitude be rapidly approaching zero as the algorithm converges? —BarrelProof (talk) 04:45, 5 October 2020 (UTC)[reply]

Yes, you are right, it should be

\mathbf {p} =-\nabla f(\mathbf {x} )

, thanks. Indeed, you don't need

\mathbf {p}

to be a unit vector, you just need it to have the "right direction", and then multiply by the learning rate to have it big or small appropriately. If the sequence converges, which is what one hopes, then the sequence

\nabla f(\mathbf {x} _{n})

will approach (converge) 0, but how quickly it is depends on the objective function. For example, if you apply for the function |x|, then it will approach 0 slowly. — Preceding unsigned comment added by Tuyentruongoslo (talk • contribs) 18:32, 5 October 2020 (UTC) (Indeed, since the function |x| is not differentiable at the minimum point 0, what I wrote about

\nabla f(\mathbf {x} _{n})

does not apply, but in this general case what one can claim - in case of convergence - is that the sequence

\alpha (\mathbf {x} _{n},\mathbf {p} _{n})\nabla f(\mathbf {x} _{n})

approaches 0.)Tuyentruongoslo (talk) 04:26, 6 October 2020 (UTC)[reply]

In the function minimization algorithm with iteration counter

n

, when we increment the value of

n

, should the starting value of

\alpha

for the next backtracking line search be

\alpha _{0}

again, or should it be the value of

\alpha (\mathbf {x} _{n},\mathbf {p} _{n})

that was derived in the previous iteration? Or should it be something else? It seems like a waste to keep starting with the same value of

\alpha _{0}

over and over again even though the distance moved should basically be shrinking as

n

increases. But if the value of

\alpha _{0}

is not constant with respect to

n

, then

\alpha (\mathbf {x} _{n},\mathbf {p} _{n})

should have another argument, because the ending value of

\alpha

is a function of the starting value of

\alpha

. —BarrelProof (talk) 03:37, 6 October 2020 (UTC)[reply]

In Backtracking line search,

\alpha _{0}

is constant. It is observed (via experiments) that the performance of Backtracking line search is quite stable against hyperparameters such as

\alpha _{0}

. You are right about if every iterate one starts again from

\alpha _{0}

, then there can be a lot of waste. At step n, starting from

\alpha (\mathbf {x} _{n-1},\mathbf {p} _{n-1})

is more sensible, and you can even increase the value of learning rates not just always decrease. This is exactly dealt with in Two-way Backtracking Gradient Descent, which I wrote in but currently is deleted. Tuyentruongoslo (talk) 04:31, 6 October 2020 (UTC)[reply]

If I understand correctly, the backtracking line search (BTLS) is just the process described in section 3 of the article that uses the iteration counter j, not the entire function minimization algorithm described in section 4, which uses multiple BTLSs – one for each value of n. The selection of

\alpha _{0}

is outside of the BTLS process. Is that correct? If that is the case, then I would still be using BTLSs even if, in section 4, I chose a different

\alpha _{0}

to perform the BTLS for each value of n. Personally, I would probably do something like increase

\alpha _{0}

whenever a BTLS terminated after only a single step, and either use the terminating value of

\alpha

to initialize the next BTLS or at least decrease

\alpha _{0}

whenever a BTLS required several steps to terminate. Starting over with the same

\alpha _{0}

for every value of n seems obviously ill-advised. —BarrelProof (talk) 21:10, 6 October 2020 (UTC)[reply]

When you apply to optimisation, then the algorithm in section 4 is the original version used by Armijo, the most basic and simplest one, and actually it works very well in many cases. From this basic version, you can have modifications. Yes, you are right that we are allowed to change

\alpha _{0}

. On the other hand, you need some upper bound on how big your learning rate can be, if you want your sequence to behave well. This is treated in "Unbounded Backtracking GD" and "Upper bounds for learning rates" (for example, it is shown both theoretically and experimentally that for Morse' functions you almost don't loose anything if you use only the basic version, or the Two-way Backtracking GD with

\alpha _{0}

being constant), which were in the article, but are currently deleted.

I want also to discuss about the two requirements of citations in this paragraph "Compared with Wolfe's conditions, which is more complicated, Armijo's condition has a better theoretical guarantee. Indeed, so far backtracking line search and its modifications are the most theoretically guaranteed methods among all numerical optimization algorithms concerning convergence to critical points and avoidance of saddle points.[citation needed] For example, if the cost function is a real analytic function, then it is shown in Absil, Mahony & Andrews (2005) that convergence is guaranteed. In Bertsekas (2016), there is a proof that for every sequence constructed by backtracking line search, a cluster point (i.e. the limit of one subsequence, if the subsequence converges) is a critical point. None of these results have been proven for any other optimization algorithm so far.[citation needed]"

The first citation is explained in the next sentences so I don't think it is needed. The second citation, I can cite from the review part in my paper, but to be honest since other algorithms cannot so far prove such theoretical guarantees, you cannot find discussions of this in almost every literature out there. [Usually, people won't mention that a method cannot do something.]

Tuyentruongoslo (talk) 07:29, 7 October 2020 (UTC)[reply]