Talk:Matrix calculus/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2 Archive 3

Derivative of matrix trace: Hessian vs Jacobian notation?

I am new to the Hessian vs Jacobian debate, but appreciate the consistency of this article. The section on trace derivatives seems to go against this however: the gradient of a scalar wrt a the n*m matrix X should be m*n, according to the article. tr(AXB) is such a scalar function; however A^T B^T has dimensions n*m. So shouldn't it the result be BA ? Same goes for other trace result. Would it not make sense to include some vademecum to explain how to move from one notation to the other, since there seems to be a hard point in choosing the notation to use? 94.108.192.45 (talk) 16:00, 3 January 2009 (UTC)

Just a comment on this Hessian vs Jacobian debate: I was visiting this page and I noticed that the expressions for derivatives for matrix traces were transposed w.r.t. the notation used in the rest of the article. I went to the discussion page intending to start a discussion on this and noticed that the most recent comment was on this very same topic, noting the exact same problem, and there were no follow-ups to it in several months. Therefore I am going to take it upon myself to fix the issue (which is a simple transposition of an equation). (user danpovey, not currently logged on; change now made, June 9, 2009.)

Product Rule Question

In general both and have 4 dimensions. Along which of the four dimensions is the multiplication performed in each of the two terms of the product rule? Since this is not clear from this article alone, maybe a note or link to another article would be useful? —The preceding unsigned comment was added by 129.82.228.14 (talkcontribs) 17:06, August 1, 2007 (UTC)

Good point. I don't have any idea how to do it, though. — Arthur Rubin | (talk) 17:26, 1 August 2007 (UTC)
If you work out the derivative at the component level you'll see that the first term in the product rule is done along the fourth dimension and the second term is done along the third dimension. While this is the only way the derivative will work, I agree that the notation is lacking since anyone trying to learn from this page would not know this. Either the notation needs to explicitly show this, or a note needs to be made on the page.

I am by no means an expert on this, but I think the last two dimensions are just along for the ride; that is, you can think of this four-dimensional matrix as a two-dimensional matrix with each element being a two-dimensional matrix itself. When you do the multiplication, you simply do the scalar multiplication of each element of the standard 2D matrix with the matrix element of the fake 2D matrix, leaving you again with a matrix of matrices. I also think the product rule equation is wrong. The first argument of the addition (ZT dY/dX) needs to be transposed for the dimensions to work out.

needs to be

DRHagen (talk) 13:15, 17 May 2009 (UTC)

Actually, I'm having second thoughts about this change. I am going to revert it until I find a reference that says one way or the other, or I become more comfortable with the notation used on this page.DRHagen (talk) 19:33, 17 May 2009 (UTC)

I've convinced myself that the stated equation is incorrect by letting X be a scalar. This would cause dY/dx and dZ/dx to have same dimensions as Y and Z, respectively. YT*Z does not have the same dimensions as ZT*Y and, therefore, cannot be summed. See the Imperial College external link for its definition of the product rule.DRHagen (talk) 15:44, 18 May 2009 (UTC)

I'm afraid that the product rule cannot be stated consistently if the "variable" of differentiation is a vector or matrix, as our definitions only lead to real (less than or equal to) 2-dimensional matrices for the case where the function to be differentiated is scalar, the variable is scalar, or we differentiate a column vector by a column vector or a row vector by a row vector. Perhaps it would be better to write out the chain rule and product rule in full 6-index notation so the reader can see what is meant? (In the case of row-vector by row-vector, the multiplication in the chain rule is reversed, anyway....) — Arthur Rubin (talk) 17:44, 3 July 2009 (UTC)

Matrix derivative and the chain rule

According to Jan R. Magnus, the derivative of a matrix is

.

For this derivative, the chain rule (and other rules) apply in a straightforward fashion, also for tensors of rank > 2.

Sources:

Magnus, Jan R. (July 25, 2006). "Matrix calculus and econometrics" (PDF).

Magnus, Jan R. (Nov. 21, 2008). "Derivatives and derisatives in matrix calculus" (PDF). {{cite web}}: Check date values in: |date= (help)

  Cs32en  15:53, 5 July 2009 (UTC)

The chain rule works, but the product rule (and virtually all the other rules we've written) require proper use of the vec operator, perhaps with additional careful allocation of transpose. It may be the only way we could handle it without going to tensor notation, but it doesn't really seem satisfactory. — Arthur Rubin (talk) 16:47, 5 July 2009 (UTC)

In tensor notation, the above would be, for a 4-D tensor,

,

where and .

The vec operation in matrix calculus is, of course, a special case of the vec operation in tensor calculus. However, the matrix vec operation, as it's commonly understood, is:

,

while in tensor notation,

.

In my view, matrix calculus is a mathematical term, so we should use notation that is being used in mathematics, not engineering notation. We can however, stay within two dimensions, so that we do not need to introduce tensor notation.  Cs32en  00:12, 9 July 2009 (UTC)

Perhaps so. But there really isn't a single standard notation used in mathematics. — Arthur Rubin (talk) 17:58, 11 July 2009 (UTC)

Massive additions by Stpasha

Stpasha (talk · contribs) has added a number of formulae for derivatives of matrix expressions by scalars. I removed them, because:

  1. They are out of scope for this article.
  2. He/she introduces new notation (which matrices are independent of t) which shouldn't apply to the existing formulae (that could be fixed by changing X to X(t), but it shows the sloppiness of thought which seems to have gone into them.)
  3. A different notation for transpose was introduced.
  4. is not well-defined.
  5. That leaves only (notation corrected):

which may be of some interest. Some of the other equations follow from the fact that if f is a scalar function of a matrix X, and X is a matrix function of a scalar t, then the chain rule, correctly written, becomes

Arthur Rubin (talk) 21:02, 6 July 2009 (UTC)

Arthur, with all due respect but I cannot agree with your arguments.
1. "Out of scope" is a strong claim. Matrix calculus can be thought as any calculus involving matrices: derivative of scalar w.r.t. matrix, vector w.r.t. vector, or matrix w.r.t. a scalar. I would say that matrix calculus is anything beyond the scope of standard calculus, but before we enter the domain of 3+ order tensors.
2. The fact that X depends on t now does not affect any other formulas as long as those formulas do not involve derivatives with respect to t. The reason why I didn't write X(t) was because it would only have made the notation cumbersome and more difficult to understand. No sloppiness of thought has occurred: it is common in algebra to have a,b,c denoting constants, and x,y,z variables.
3. Yes, a different notation for transpose was introduced. It makes formulas look tidier. The issue of bold vs. normal font vectors, as well as T vs. ' transpose hasn't been discussed on this talk page; however I've seen quite a few such discussion on other pages, and generally people tended to prefer the latter. (Note: transpose is used only within Example section, it'll be easy to fix notation)
4. is perfectly well-defined (unless X is singular): it's a natural logarithm of the absolute value of the determinant of X.
5. And regarding the chain rule to which you appeal, well it's been already pointed out that as it is written right now, the rule is fallacious. A derivative dZ/dY is not well-defined within the domain of matrix calculus, since it's no longer a matrix but instead a 4-th order tensor. And a product dZ/dY×dY/dX is not even a correct mathematical notation.
The chain rule as you've written it is interesting indeed. Not being specialist on this subject, I do not see how it follows from any of the formulas written on the page. Maybe you should considering adding this formula to page's content? Anyways the formula works only when ƒ is a scalar function. However I don't see how you suggest to handle for example
// Stpasha (talk) 22:54, 6 July 2009 (UTC)
1. OK, it's arguable. It doesn't seem that there's a general consensus either way.
2. I think X(t) is the only way we can maintain a standard notation for the article; a (constant) and a (constant vector) are already causing enough trouble. It seems best to have all dependencies explicit, rather than introducing another notation.
3. It's T now; there's no reason to change, and I do not like " (which is different than "'", making it more difficult to match text and formulas).
4 is semantically wrong for the log of the determinant, for two reasons: the two vertical bars have different meanings, and we don't use the vertical bar for determinant elsewhere in the article. Perhaps , but it then follows immediately from the chain rule applied to scalar functions, although I suppose it still is of some interest.
5 I agree that the expression works only for a scalar function, but that generalizes what you wrote. As for , the only sensible way of writing it is in terms for formal differentials: , although has some elegance to it.
(That's another reason I don't like for transpose; it makes sense as a derivative.)
Arthur Rubin (talk) 23:34, 6 July 2009 (UTC)
The article notes: The directional derivative of f in the direction of matrix Y is given by
It follows that
My mistake as to the transpose. — Arthur Rubin (talk) 23:40, 6 July 2009 (UTC)
Alright, so how about we add this formula + formula for ∂Xtr(X-1A) + formula for ∂tX-1; and all others can be derived from these 3. And i don't think there is a need to explicitly state the dependence X(t), seeing as the article hasn't been doing this in sections "vector calculus" and "matrix calculus". It might be also prudent to replace ƒ and F with y and Y, in order to reduce the number of different symbols in the article. // Stpasha (talk) 00:45, 7 July 2009 (UTC)
I cannot agree that the removal of X(t) is a good idea; for all other formulas except the chain rule and product rule (which are pretty questionable, themselves), only the named variable is ... well, variable. — Arthur Rubin (talk) 06:49, 7 July 2009 (UTC)

Full tensor notation

For the chain and product rules.... Suppose we write the entry corresponding to as . (I think that's the way the block matrices we selected work.)

Then the formal chain rule becomes

and the formal product rule becomes (with ), as

,

but I can't think of a good way of putting it into the article. — Arthur Rubin (talk) 00:20, 9 July 2009 (UTC)

Special cases where all the intermediate matrices are "matrices", rather than block matrices:
  • Chain rule
    • X, Y, and Z are column vectors (suppressing j l β)
    • X, Y, and Z are row vectors (suppressing i k α)
    • X and Z are scalars (suppressing i j k l)
  • Product rule
    • X is a scalar (suppressing k l)
    • X and Y are column vectors, Z is a scalar (suppressing j l α)
      same equation, except the first product is not a matrix multiply, but a matrix × scalar (!)
    • X and Z are row vectors, Y is a scalar (suppressing i k α)
      same equation, except the second product is not a matrix multiply, but a scalar × matrix (!)
    • (added) Y and Z are scalars (suppressing i j α)
      same equation again, except that both products are matrix × scalar or scalar × matrix (!!)

Arthur Rubin (talk) 00:49, 9 July 2009 (UTC)

Giving the formulae for single entries of the resulting matrix would probably not be of much help for the reader if it is not clear at what places the entries are located in the resulting matrix. My proposal would be:

and

.

  Cs32en  06:30, 9 July 2009 (UTC)

I don't think that's a standard tensor product , although I'm not sure. And I'm still absolutely opposed to using a ′ for transpose in an article which could logically use ′ for a dervative, with defining the operations. — Arthur Rubin (talk) 18:31, 11 July 2009 (UTC)
Also, it is clear what order the entries are in the resultant matrix: row "numbers" and column "numbers" are each in lexicographical order: 11, 12, 13, ..., 21, 22, 23, ... etc. — Arthur Rubin (talk) 08:43, 12 July 2009 (UTC)
I don't have any problems with using T instead of ' for the transpose operator. This is "just" an issue of notation. is the Kronecker product. Sorry for the confusion related to the ordering of entries, I had been reluctant to say that the ordering of entries, as given in your equations above, is wrong.  Cs32en  09:48, 12 July 2009 (UTC)
Sorry, I accept the Kronecker product notation as standard. Apologies for my confusion. — Arthur Rubin (talk) 01:11, 23 July 2009 (UTC)

Disputed information: Matrix derivative

The following equation from the article, in section Matrix calculus, does not seem to be correct:

See Abadir, Karim M.; Magnus, Jan R. (March 12, 2007). "On some definitions in matrix algebra" (PDF). p. p. 11. Retrieved July 9, 2009. {{cite web}}: |page= has extra text (help) (Replacing bot signature for my own comment.)  Cs32en  18:34, 11 July 2009 (UTC)

It's correct as we use it, although our notation doesn't appear completely standard. However, the article now is completely wrong as some of the equations use the notation we selected, and some use the notation you prefer. It might be better to revert to a consistent (if not entirely correct) article, rather than one in which each line uses a different notation. — Arthur Rubin (talk) 17:53, 11 July 2009 (UTC)
I have changed the equations only in those cases in which the results were wrong, independent of the notation that is being used. If you define the derivative as instead of as , then the results are only correct if X and Y are both vectors. Therefore, I left such expressions as unchanged, although the better notation would be (which is the same as ). Unfortunately, you cannot use the notation that ignores the vectorization operators if you are dealing with matrices or tensors of higher order in the derivatives. If we restrict the article to this notation, the equations involving matrices would have to be removed, and the article would no longer treat matrix calculus, but only vector calculus.  Cs32en  18:34, 11 July 2009 (UTC)
(ec) I've reverted your effective change of the definition of matrix derivative; it may be better, and I'll help maintain the article if it's selected, but there is no agreement to use it. I may have reverted some changes that would be helpful, but nothing I restored (except the chain and product rules) is wrong. But your product rule doesn't make any sense without explictly using the vec definition, even disregarding the question of ′ for transpose or derivative. — Arthur Rubin (talk) 18:40, 11 July 2009 (UTC)

An example: the chain rule

The chain rule is currently given in the article as follows:

Consider the functions Z(y) and y(x), where Z is a 2x2 matrix, y is a 2-element column vector, and x is a scalar.

Then, according to the definition of the derivative given above, is a 2x4 matrix, and is a 2x1 matrix. The matrices thus cannot be multiplied.

Thus, the chain rule, as given in the article, is not correct if the definition of the derivative given in the article is being used.  Cs32en  19:26, 11 July 2009 (UTC)

The chain rule according to the definition currently presented in the article

    Red XN

  Cs32en  14:52, 12 July 2009 (UTC)

Cs32en: You've incorrectly expanded the definition of . It's a 1x2 matrix, whose elements are themselves 2x2 matrices:

You made the mistake of dropping the inner extra brackets, that's all. This example of the chain rule works out fine. —Preceding unsigned comment added by Q91 (talkcontribs) 18:42, 22 July 2009 (UTC)

This way, a result can be computed, and you arrive at a collection of partial derivatives which are, each by itself, correct. However, the method you propose fails as soon as you have a derivative with regard to a matrix. If
would be true, then, in general,
.    Red XN
Other calculations that make use of such a definition of the matrix derivative fail in similar ways.
In addition, the article doesn't properly explain that "extra brackets" should be used. The use of such bracket would also mean that we would move outside of the scope of matrix calculus, as it implies the use of tensors.   Cs32en  00:12, 23 July 2009 (UTC)
I agree completely with Cs32en on this issue; the chain rule does not work as written unless the matrices are all vectors. You can do something with block matrices, but the block matrix notation as Q91 suggests works only if x is scalar. — Arthur Rubin (talk) 01:04, 23 July 2009 (UTC)
To be precise: if x is scalar, then
where the dZ/dY is a block matrix, and the matrix multiply and trace are taken as block matrices (with the inner multiply in the block matrix multiply being a matrix by scalar multiply). If Y is a column vector, the trace operator is unnecessary. But this is probably too complicated for inclusion. — Arthur Rubin (talk) 01:25, 23 July 2009 (UTC)
To be honest, I was only concerned with the case when is a variable. I haven't thought about anything beyond that. I just happened to notice that you expanded the definition incorrectly in your first counter-example. I use physics-style tensors myself. I was just curious to take a look at "matrix calculus" notation for similar things. It seems like there is no definitive answer around here... (yet)! Q91 (talk) 03:58, 23 July 2009 (UTC)
There actually is a definitive answer. An example of a correct usage of the matrix derivative can be found at Elasticity tensor#Anisotropic homogeneous media and Hooke's law#Anisotropic materials Cs32en  12:48, 23 July 2009 (UTC)

The chain rule is simply wrong, if we do not specify which product is used! Let be matrices of size respectively. According to this article is a "formal block matrix" of size which components are matrices of size or a "flat matrix" of sizes . Now for the given chain rule , we have to multiply a block matrix with a block matrix to yield a block matrix, or multiply a matrix with a matrix to yield a matrix. For there is no product I know which can produce such result!

I meant without using tensor product and contraction. --Waldelefant (talk) 17:18, 22 July 2010 (UTC)

Chain rule involving matrix valued functions

I did not realize the remark about vectors, I am very sorry about the entry above. Thus should we state the chain rule (invoving matrix valued functions) should be read as

Waldelefant (talk) 17:40, 25 July 2010 (UTC)


The chain rule according to the vectorial definition of the matrix derivative

Let .

    Green tickY

Note: In general, , if v is not a scalar.

  Cs32en  14:52, 12 July 2009 (UTC)

Proposed "Identities" section

Note that matrix multiplication is not commutative, so in these identities, the order must not be changed.

  • Chain rule: If Z is a function of Y which in turn is a function of X, then
  • Product rule:

  Cs32en  19:53, 11 July 2009 (UTC)

Proposed "Identities" section comments

Completely unacceptable. The only formulations which are at all acceptable, even in your prefered notation, are:

where the vec can be assumed, and
where some of the vec can be assumed. — Arthur Rubin (talk) 20:10, 11 July 2009 (UTC)

Proposed "Examples" section

Derivative of linear functions

This section lists some commonly used vector derivative formulas for linear equations evaluating to a vector.

Assuming x and a are "column vectors", the first result above is correct. But in the main article the result is transposed. Why? The second result above is inconsistent with the first and is incorrect. The natural rule is to use the orientation of x to determine the orientation of the result. (Do it element by element to see.) Cerberus (talk) 03:34, 3 December 2009 (UTC)
The following should be correct (see below). I don't know whether the above version may be used by engineers or not. One may assume that the "denominator" is always considered to be a row vector, but such a convention is not particularly useful as soon as there are matrices, not vectors, in the "denominator".  Cs32en  02:10, 7 December 2009 (UTC)
Actually, the correct form is:
in the notation we had selected. — Arthur Rubin (talk) 06:44, 7 December 2009 (UTC)
That is at least consistent notation, but it is i. unusual and ii. undesirable. Conceptually we just have a function and we are trying to decide how best to understand and . Since is a "column vector" (i.e., an matrix), the only natural way to read is as a column vector and the only natural way to read is as a row vector (i.e., as the gradient). Cerberus (talk) 19:46, 7 December 2009 (UTC)
Not really. Using the "present" notation,
  1. If x and f are column vectors, then
  2. If x and f are row vectors, then
Using your convention, if x is a column vector and f is a scalar, then
where the centered dot represents the dot product, rather than matrix multiplication.
As an aside, the derivative of a scalar with respect to a covariant vector is a contravariant vector, so there's some rationale for distinguishing the space they live in. — Arthur Rubin (talk) 20:59, 7 December 2009 (UTC)
Hi Arthur, is probably what you meant to use in the formulae, not , right?  Cs32en  21:15, 7 December 2009 (UTC)
Actually, I meant "δ: indicating the actual change in a variable, rather than "d", indicating a differential. — Arthur Rubin (talk) 21:15, 13 December 2009 (UTC)

I think that Chapter 6 of Abadir, Karim M.; Magnus, Jan R. (March 12, 2007). "On some definitions in matrix algebra" (PDF). p. p. 11. Retrieved July 9, 2009. {{cite web}}: |page= has extra text (help) can help us to clarify these issues. Did you have a look at that text, Cerberus? I could also provide some content from Magnus and Neudecker, 2002 (1988), Matrix Differential Calculus, 2. rev.  Cs32en  21:11, 7 December 2009 (UTC)

Derivative of quadratic functions

This section lists some commonly used vector derivative formulas for quadratic matrix equations evaluating to a scalar.

Related to this is the derivative of the Euclidean norm:

Derivative of matrix traces

This section shows an example of the derivative and the differential of a common trace function.

Derivative of the determinant

Correction:


  Cs32en  19:53, 11 July 2009 (UTC)

Proposed "Examples" section comments

Remark: I have changed the notation for transpose after Arthur Rubin posted the following comment.  Cs32en  23:53, 11 July 2009 (UTC)

You've added 4 new notations for the trace and determinant sections:

  1. D F
  2. d F
  3. vec
  4. "'" for transpose.

You've also removed our formulas involving derivatives by a row vector, which are perfectly well defined in our notation, but not in yours. If you correct the ones which can be properly stated in our notation correctly, we can consider whether the addition of the other notations are appropriate. — Arthur Rubin (talk) 19:59, 11 July 2009 (UTC)

I have shown in the section An example: the chain rule above that the notation which is currently being used in the article is invalid if there are matrices (i.e. not just vectors) in the derivative.
As for the definitions above,
  1. is the derivative, i.e. an incremental change related to some other variable.
  2. is the differential, which defines how an incremental change is being calculated.
  3. is the well-defined vectorization operator.
  4. is the transpose operator. I would not object to changing this to , of course.
These are not new notations, actually, but new mathematical operators. In my view, the identities equations and the examples (especially the product rule) cannot be understood without introducing these operators.  Cs32en  20:19, 11 July 2009 (UTC)
(ec) And you complained about my convoluting indexing above. Your formulas for far exceed anything I could come up with. — Arthur Rubin (talk) 20:21, 11 July 2009 (UTC)
(ec 2) d is standard, and vec is acceptable; D has no credible meaning, and as the transpose operator should never be used in places where as derivative is plausible, such as this article. — Arthur Rubin (talk) 20:23, 11 July 2009 (UTC)
There is no real controversy about the differential, the vec operator and the transpose, as far as I can see. is defined as,
,
a derivative involving two vectors. The elements of the derivative are scalars: .
Sorry for the complicated formulae. Less complicated formulae are probably incorrect, however.  Cs32en  20:57, 11 July 2009 (UTC)
The transpose in the demoninator is just wrong. Without the transpose, we have , which looks like a normal derivative. For matrices, , whcih still makes sense only without the transpose.
Actually, that's
.
For the notation, see also: Abadir, Karim M.; Magnus, Jan R. (2005). Matrix algebra. Cambridge University Press. pp. 351–395. Retrieved July 11, 2009.  Cs32en  22:16, 11 July 2009 (UTC)
I don't see why we're required to use misleading notation, even if (a) source uses it. — Arthur Rubin (talk) 22:23, 11 July 2009 (UTC)
The notation is not misleading, of course. It leads to correct results, while the other notation does not, if there are matrices, not just vectors, in the differentials or derivatives. Note that the source is a book published by Cambridge University Press. You can find the same formulae in Magnus, Jan R.; Neudecker, H. (1999 (1988)). Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley. Retrieved July 11, 2009. {{cite book}}: Check date values in: |year= (help) This is a standard work in the field.  Cs32en  22:45, 11 July 2009 (UTC)
I disagree that it's a standard work in the field, and is clearly wrong dimensionally, even if redefined by Magnus to be correct. Also, contrary to what you've written, our notation for makes sense (without introducing tensors, block matricies, or tensor products) as long as neither:
  1. Y has multiple columns, and X has multiple rows, nor
  2. Y has multiple rows, and X has multiple columns.
Equivalently, if
  1. Y is scalar,
  2. X is scalar,
  3. Y and X are both column vectors, or
  4. Y and X are both row vectors,
there is no difficulty in the defintion. This means the product rule rarely makes sense, but we can't have everything. Your product rule only makes sense if you carefully define vec, vec-1, and . (Actually, vec-1 is only necessary if we are to use conventional matrix multplies of dervatives, which may not be required.) — Arthur Rubin (talk) 08:24, 12 July 2009 (UTC)
← It's not
,
but
.
I agree with your observation that the equations are correct, as long as the entities in the differential and the derivative do not have more than one non-singular dimension. However, as the article is about matrix calculus, we cannot restrict ourselves to describing these cases. Note also that this is not about tensor mathematics. For example,
is not generally valid if A, X or B are tensors.  Cs32en  09:37, 12 July 2009 (UTC)
Additional comment: The equations do not involve an inverse of the vec operator (vec-1), but the vectorization of the inverse of a matrix: vec(X-1).  Cs32en  09:42, 12 July 2009 (UTC)
I agree that vec-1 is not needed, although it could simplify equations in some cases. is dimensionally wrong. The correct formulation, if derivatives are to act properly as matrix (or tensor) operations, is: The elements of the derivative are scalars: . — Arthur Rubin (talk) 14:25, 12 July 2009 (UTC)
vec-1 is not very well defined for matrices, although it's useful for tensors (outside the scope of this article). I don't see how the vectorial definition of the matrix derivative should be dimensionally wrong. If , and , then  Cs32en  15:07, 12 July 2009 (UTC)
I should have said covariant vs. contravariant wrong, although which is which is unclear. In vector calculus, dy/dx should be dimensionally the same as y/x, which is a matrix. dy/(d(x')) would be something completely different. — Arthur Rubin (talk) 15:54, 12 July 2009 (UTC)
The derivative of matrix with regard to a matrix is actually a mixed tensor of rank 4. However, it can be represented by an array that has the same properties as a matrix, as long as we are only dealing with vectors and matrices, i.e. no tensors of higher rank. So the vectorial definition of the matrix derivative is actually only valid for (as the name implies) derivatives of matrices, not for derivatives involving tensors of higher rank. The definition of the tensor derivative, however, is similar, but it uses a slightly different definition of the vectorization operator.  Cs32en  16:27, 12 July 2009 (UTC)
is a sloppy notation often used in vector calculus, which does no harm as long as you are only dealing with vectors in the derivative, not matrices.  Cs32en  16:40, 12 July 2009 (UTC)
Can you explain why the vectorial definition of the matrix derivative doesn't extend to arbitrary — well, not exactly tensors, but multi-dimensional arrays? The definition ignores any covariant-contravriant distinction in the underlying matrices, so it doesn't directly apply to matrices-as-linear-operators.
And your comment is exactly the reverse of what is correct mathematically. would be a tensor of order 2 which is not a matrix. — Arthur Rubin (talk) 16:46, 12 July 2009 (UTC)
Ad 1.) Because the vec operator, when applied to a tensor, usually increases the rank of the tensor. (It depends somewhat on what type of tensor we are talking about, in any case, it increases the number of contravariant dimensions.) In the context of vectors and matrices, the result of the vec operator is a vector.
Ad 2.) Yes, in the context of tensor mathematics, it would be a mixed tensor of order 2. Such a tensor, in the context of matrix calculus, can be represented by a matrix, however.
Additional remark: In a tensor, covariant and contravariant dimensions are defined, while this is not the case for an arbitrary multidimensional array.
Question: Did you have a look at the detailed example on the chain rule that I have included above?  Cs32en  17:14, 12 July 2009 (UTC)
Unless you have a different definition of vec than I can see, the vec operator produces a 1-tensor, regardless of the degree of the tensor it is applied to. If not, you need to define it.
And any differential operator clearly converts the variable differentiated against between covariant and contravariant which corresponds in the matrix domain to a transpose. An explicit transpose is wrong.
Aside from that, the chain rule and the product rule work better in your notation. However, all the formulas now in the document, except the chain rule and product rule, are (or recently have been) correct in the present notation. Any claims otherwise have previously been rejected. — Arthur Rubin (talk) 22:50, 12 July 2009 (UTC)
←The vec operator works as follows:
Matrix calculus:
Tensor calculus (this depends somewhat on what kind of tensors we are talking about):
With regard to the conversion to the transpose, actually transforms x into its transpose, xT.
What is the meaning of saying that "the chain rule and the product rule work better" when using the vectorial definition of the derivative, when these rules actually don't work at all with the definition that is currently being presented in the article? Of course, the examples in the article are derived from the application of both rules, so there is little chance of finding any example that is correct when using a definition of the derivative for which the chain rule and the product rule do not work. And the solutions for the examples given in the article are indeed wrong.  Cs32en  05:45, 13 July 2009 (UTC)
Nothing, other than the chain rule and the product rule, is wrong in this article. The chain rule and product rule, using the notation presently in the article, only work in full generality if you go to full tensor notation. And your comments on the transpose of the vec in the denominator are contrary to the standard notation used in the vector calculus section and the notation normally used in vector fields in mathematics, and should be ignored as being inappropriate, even if (possibly) correctly redefined in Magnus's papers. — Arthur Rubin (talk) 14:49, 13 July 2009 (UTC)
I'm glad that we are agreeing about the presentation of the chain rule and the product rule, as currently included in the article, being wrong. Let me repeat that there is no problem with the notation (the lower part is not really a denominator), as long as we are only dealing with vectors – or vector fields, for that matter. The work of Heinz Neudecker, Jan R. Magnus, Karim M. Abadir and other is not just "papers". See, for example, the list of reprints for Magnus, J. R. and H. Neudecker (1988). Matrix Differential Calculus with Applications in Statistics and Econometrics, John Wiley and Sons: Chichester/New York. Reprinted 1990. First revision 1991, reprinted 1994, 1995, 1997, 1998. Second edition (paperback) 1999, reprinted 1999, 2001. Google Scholar lists 1,523 citations for this work. – Remark: Some of the examples, e.g. the derivative of tr(AXB), seem to be consistent with the (incorrect) definition of the matrix derivative presented in the article.
I think that we have both presented our respective views on this issue quite clearly. Continuing the discussion at this point will likely lead mostly to a repetition of previously stated observations. So, I'll continue discussing this when (a) someone shows up here as a result of the request at the math project page (b) a references is given for the formulae that I have tagged with {{fact}} templates (c) some really new argument appears (d) the article is being changed in some way. (Please feel free to post a reply to this comment.)  Cs32en  15:23, 13 July 2009 (UTC)
Agreed. — Arthur Rubin (talk) 15:45, 13 July 2009 (UTC)

Help request at WikiProject Mathematics

I've asked for help on the definitions of the matrix differential and the matrix derivative at Wikipedia talk:WikiProject Mathematics#Matrix calculus: Definition of the matrix derivative Cs32en  22:54, 11 July 2009 (UTC)

Good luck. I think it's been tried 4 or 5 times already, mostly before I came on board, so you can't really blame me for the problems. — Arthur Rubin (talk) 08:26, 12 July 2009 (UTC)
I don't blame you. It's actually a situation where errors in several sources, including other internet sites, are reinforcing each other.  Cs32en  09:17, 12 July 2009 (UTC)

Perhaps one, or both, of you could create a new section on this talk page and succinctly list the issues for which outside comment is being sought. Then re-post on the WP Math talk page linking to that section. I would certainly be much more likely to comment if this discussion was more clearly delineated. Cheers. RobHar (talk) 19:51, 13 July 2009 (UTC)

Scope of questions

Remark by Cs32en: The following exposition presents my proposal as if it would not be clearly defined. Discussing the aspects of this exposition that would, in my view, need to be changed, would, however, reopen the controversial discussion that should be avoided when presenting an overview that would make the debate accessible to new editors. I have therefore presented an overview of the main point of the controversy in the section Definition and notation of the matrix derivative below. Most of the other actual or potential disagreements can, in my view, be cleared up after this fundamental controversy has been solved.  Cs32en  10:54, 15 July 2009 (UTC)

Definition of matrix derivative

The existing article is equivalent to defining the entries of

as

where l i means the indices are listed in lexicographical order.

CS32en's proposal is defining the entries explicitly as

or

depending on the choice of vec operator. (He seemed to be using the former earlier, but the later definition of the vec operator seems to lead to the latter.)

Whether one of these definitions is to be selected, or yet another one, is one of the primary matters. I think our decision should be based on what's actually used in the real world.

If CS32en's proposal as to the effect of the definition is accepted, then it's clear to me that the definition should be:

, rather than his definition as

in spite of his claims that that's the way it's done. I have a number of references for the vector derivative, all of them having an implied transpose in the "dependent variable".

Also, the question of whether formal differential notation should be used, leading to the elegant expressions which don't fit in either of our notations,

and

Notation in article

I've seen the d notation used in all sorts of formal differentials before, contrary to what I wrote above. That might simplify the presentation in general.

However, the vec notation would need to be defined, sourced, and verified as to it being standard, and D (Y) (X) (apparantly as the part of dY due to changes in X) doesn't appear at all standard, and needs to be defined, sourced, and verified that the definition is standard.

There are two questions here, as well; whether the notation can be used (as adequately sourced), and whether it should be used.

Presentation of identities

(In regard the #Identities section of the article)

If the current definition is kept, whether the chain rule and product rule should be edited as I noted above in the section #Full tensor notation above, restricted to cases of the result being "true matrices" (i.e., only one of the subscripts combined lexicographically is nontrivial), and/or whether some form of the formal 5-or-6 index sums should be included.

If CS32en's preferred notation is selected, whether the derivation of the rules should be included. (I would say, not, as it depends on concepts not commonly used.)

Presentation of examples

Whether any of the examples in the article are presently wrong. (I would say, not, but Cs32en objects to all of those which have a derivative with respect to a matrix.)

Whether derivation of the examples should be included, in CS32en's notation. (Again, even if selected, the derivation seems to require even more bizarre notation than the formulas, which would need to be defined.)

Arthur Rubin (talk) 21:02, 13 July 2009 (UTC)

Comment by Cs32en

Regarding the derivative, if we denote contravariant (vertical) indices as superscripts and covariant (horizontal) indices as subscripts, then the derivative of a matrix with regard to another matrix is, according to K.M. Abadir and J.R. Magnus (Abadir, Karim M.; Magnus, Jan R. (2005). Matrix algebra. Cambridge University Press. pp. 351–395.),

,

while in the definition currently presented in the article,

.

  Cs32en  22:14, 13 July 2009 (UTC)

Definition and notation of the matrix derivative

Definitions

Let

,

where m and p are contravariant (vertical) indices, n and q are covariant (horizontal) indices.

Then, we define the "tiled" matrix derivative and the "vectorial" matrix derivative as follows:

Tiled matrix derivative:

Vectorial matrix derivative:

Formulae

Formulae for the matrix derivative
Chain rule Product rule Notation conforms with vector calculus notation in engineering Found in the literature Sources Arthur Rubin's position Cs32en's position
Red XN Red XN Green tickY Green tickY Pedersen, K.B.; Pedersen, M.S. (Nov. 14, 2008). "The Matrix Cookbook". {{cite web}}: Check date values in: |date= (help) Green tickY Red XN
Red XN Red XN Red XN Red XN
Red XN Red XN Green tickY Red XN
Red XN Red XN Red XN Red XN
Green tickY Green tickY Green tickY Green tickY Brookes, M. (2009 (2005)). "The Matrix Reference Manual". London: Imperial College. {{cite web}}: Check date values in: |year= (help)
Green tickY Green tickY Red XN Red XN
Green tickY Green tickY Green tickY Red XN
Green tickY Green tickY Red XN Green tickY Abadir, K.M.; Magnus, J.R. (2005). Matrix algebra. Cambridge University Press. Red XN Green tickY

Sorting out the explanation

The section "Relation to other derivatives" is fairly misleading. You can do component-wise notations for the Fréchet derivative, and that is what this article does. The Fréchet derivative is completely independent of norm for the finite-dimensional case, and any norm is just the same as far as the formulae are concerned. Therefore citations for the chain and product rules (the latter being a case of the "bilinear rule") can be taken from sources for the general thing. The existence of valid formulae where the partials exist but the F-derivative doesn't is an analysis question, really, unlikely to be that important in applications. The tone asserting distinctiveness of the concept of matrix calculus really is inappropriate. Charles Matthews (talk) 10:21, 29 July 2009 (UTC)