Total and Partial Differentials as Algebraically Manipulable Entities

Maria Isabelle Fite; Jonathan Bartlett

doi:10.5772/intechopen.107285

Abstract

Differential operators usually result in derivatives expressed as a ratio of differentials. For all but the simplest derivatives, these ratios are typically not algebraically manipulable, but must be held together as a unit in order to prevent contradictions. However, this is primarily a notational and conceptual problem. The work of Abraham Robinson has shown that there is nothing contradictory about the concept of an infinitesimal differential operating in isolation. In order to make this system extend to all of calculus, however, some tweaks to standard calculus notation are required. Understanding differentials in this way actually provides a more straightforward understanding of all of calculus for students, and minimizes the number of specialized theorems students need to remember, since all terms can be freely manipulated algebraically.

Keywords

differentials
differential operators
derivatives
partial derivatives
total derivatives

Author Information

Show +

Maria Isabelle Fite
- University of Tulsa, Tulsa, Oklahoma, United States
Jonathan Bartlett*
- The Blyth Institute, Tulsa, Oklahoma, United States

*Address all correspondence to: jonathan.bartlett@blythinstitute.org

1. Introduction

Derivatives are usually written in a notation, such as dydx, where the notation implies that there are two distinct values, dy and dx, at play. Historically, dy and dx were considered infinitesimal values—values so small that they are practically zero, but not quite zero, and often became real numbers when put in ratio with each other. This understanding was challenged by practitioners who thought that infinitesimal values were insufficiently rigorous to be used in mathematics.

This led to a reconsideration of derivatives using the concept of a limit. In the limit definition of the derivative, the dy and dx terms do not have independent existences, but exist only within the ratio itself. In this conception, the ratio is merely suggestive of how the derivative was originally produced but does not represent an actual quotient of two distinct values. The limit definition of the derivative has been reinforced by the fact that treating differentials as distinct values leads to contradictions in many cases.

However, the work of Abraham Robinson in the 1960s showed that there was no fundamental flaw in expanding the number system to include infinitesimals. The hyperreal numbers are an extension of the real numbers which allows for infinitesimals and infinities to be constructed in a manner equally rigorous with the real numbers. Additionally, unlike other conceptions of infinities, the hyperreal numbers have an additional advantage that infinitesimals and infinities can be manipulated using arithmetic and algebraic operations.

However, if infinitesimals can be readily considered without contradiction, why does the notation for derivative operations often lead to contradiction? The flaw here is actually in the notation itself. Because the notation was not considered factual but merely suggestive, practitioners tended to ignore the problematic cases rather than solve them. By considering new and more rigorous approaches to notation, a better notation can be developed which includes infinitesimal values, removes the contradictions, and provides a more straightforward understanding of differential notation and formulas. In these new formulations, differentials such as dy and dx are fully independent, algebraically manipulable entities.

2. Problem of separating differentials in modern Leibniz notation

While the problems that occur when trying to separate differentials in modern Leibniz notation are well-known, it is worth revisiting them briefly. First of all, it is interesting to note that there are essentially no inconsistencies or contradictions when dealing with first-order total differentials. For instance, taking the equation y=x3, the derivative is dydx=3x2. Since the derivative of the inverse function is dxdy, this can be found simply by inverting both sides of the equation, so that dxdy=1dydx=13x2. Likewise, integrating is often preceded by multiplying both sides by a differential, so that dydx=3x2 becomes dy=3x2dx.

The problems become more apparent on higher-order derivatives. The typical notation for the second derivative of y=x3 is d2ydx2=6x. However, if the notation were taken seriously, this would be seen as a quotient of the higher-order differential d2y and the square of dx. Doing this, however, would break the chain rule. For instance, if you had x=t2, then you could calculate d2ydt2 by simply multiplying d2ydx2 by dxdt2. Doing so, however, yields an incorrect second derivative of d2ydt2=24t4 rather than the correct d2ydt2=30t4. This is normally calculated using the chain rule for the second derivative (or higher derivatives using Fa’a di Bruno’s formula [1]). While the second derivative chain rule works, it provides no algebraic intuition for why it works, and seems to be in conflict with the idea of treating differentials as separable values.

Dealing with partial derivatives brings up innumerable problematic cases even for the first derivative. If f is a function of x and y, and x and y are both functions of t, then the total derivative of f with respect to t is dfdt=∂f∂xdxdt+∂f∂ydydt. Since x is a function of one variable, ∂x∂t=dxdt (likewise for y). Then the equation becomes dfdt=∂f∂x∂x∂t+∂f∂y∂y∂t. Treating the partial differentials as distinct values, this reduces to dfdt=∂f∂t+∂f∂t. ² Now that it is expressed in terms of a single variable, dfdt=∂f∂t, so this yields dfdt=dfdt+dfdt=2dfdt. Dividing both sides by dfdt yields the contradiction 1=2.

As will be described, the issues in these problematic cases stem from deficiencies in the notation, not deficiencies in the concept of differentials as infinitesimals nor in the idea that differentials can be considered independently of each other. By taking a more rigorous approach to the development of the notation of higher order derivatives and partial derivatives, a straightforward notation can be obtained which enables differentials to be considered as fully distinct values.

3. Historical formal definitions of the derivative

The derivative of a function measures how the function changes as the independent variable varies. For instance, if the derivative of a function fx is 3 when x=5, that means fx is increasing at a rate of 3 units up to every 1 unit across whenever x is 5. Another way to say the same information is that the function’s slope at x=5 is 3/1=3.

Normally, slope is defined with reference to two points. When measuring velocity, for instance, which is the ratio of the change in position to the change in time, one would measure two different times with their positions and compare them. The derivative attempts to calculate the slope using only one point together with an equation. Since only one point is used, the change in x is infinitely small, and so is the change in y. Different ways of dealing with these infinities lead to different formal definitions of the derivative.

3.1 Newton’s definition

Isaac Newton provided one of the first definitions of a derivative in his book Methodus fluxionum et serierum infinitarum, or “The Method of Fluxions and Infinite Series” in English [2, 3]. Newton thought of his graphs as being drawn over time, with the x-coordinate increasing at a constant speed while the rate of increase in the y-coordinate varied. A variable’s rate of change with respect to time (what we would now call a derivative with respect to time) was called a “fluxion,” which was denoted by applying a dot above a variable, such as ẋ (which represents the derivative of x with respect to time) [3].

To avoid having to define an infinitely small quantity, Newton worked with full derivatives, ratios of infinitesimals. Since Newton assumed all his variables depended on time, he could then switch out the infinitesimal change in x and change in y for the change in x over time and the change in y over time, which were both real numbers. The ratio remained the same, and the infinities were avoided [3].

3.2 Leibniz’s definition

Unlike Newton, Gottfried Leibniz preferred to consider the change in x and the change in y separately. He used the notation dx for an infinitesimal difference in x and dy/dx for a ratio of infinitesimals, which represented the slope of a curve at a point. Leibniz considered d an operator, with dx=dx being the output of d acting on the variable x. This allowed him to apply d more than once, resulting in d2x=ddx, d3x=dddx, and so on. Just like dx was infinitely smaller than x, Leibniz said dnx was infinitely smaller than dn−1x [3].

Although his calculus relied on the concept of an infinitesimal, Leibniz regarded infinitesimals as only “purely ideal entities... useful fictions, introduced to shorten arguments and aid insight” [3]. However, Leibniz was never able to rigorously define his infinitesimals nor how they behaved. Therefore, while they seemed to work well, the lack of clarity caused some skeptics to regarded them with suspicion, ridiculing them as “ghosts of departed quantities” [4].

3.3 Delta-epsilon (limit) definition

Concerns about the fishy nature of infinitesimals, treated like nonzero numbers when dividing but also like zero when adding, led to the reformulation of calculus using the idea of limits. The limit of fx as x approaches a is the value fx approaches as x becomes closer to a.

More precisely, the limit of fx as x approaches a is L if for any given positive number ε there is a corresponding positive number δ such that the difference between fx and L is less than ε whenever the difference between x and a is less than δ [5].

Limits can then be used to define the derivative of a function fx as

f′x=limh→0fx+h−fxhE1

When limits are used to define a derivative, it makes no sense to pull apart the change in x and the change in y, as both the limit of the numerator and the limit of the denominator evaluate to zero, and division by zero is undefined.

4. Hyperreal numbers and the definition of the derivative

While the limit definition of a derivative solves the philosophical problems of infinitesimals, it does not allow the change in y to be separated from the change in x. This led Abraham Robinson to return to Leibniz’s infinitesimals in 1958, putting them on a new set-theoretic foundation and creating the field of nonstandard analysis [3].

While there are different ways to construct hyperreal numbers, the approach we will take here is based on the set theory approach described by Herrmann in [6], with many of the definitions taken from there as well. We will begin by describing hyperreal numbers (including infinitesimals), and then describe the differential operator as being an operator that can be applied using infinitesimals.

For defining the infinitesimals, the core idea is to take the set of all infinitely long sequences of real numbers, denoted Rℕ. Some of these sequences match other sequences so closely they can be considered equivalent. Each real number is then assigned to a set of equivalent sequences. Then, some of the remaining sets of equivalent sequences can be assigned to infinitesimals. Finally, all the operations normally done on real numbers can be translated to operations between sets of equivalent sequences.

4.1 Filters, the cofinite filter, and free ultrafilters: Defining big enough

A filter provides a way to classify subsets of a set as either big enough or not big enough.

Let X be a nonempty set. A nonempty subset F of the set of all subsets of X is a proper filter on X if and only if:

ifor eachA,B∈F,A∩B∈FE2

iiifA⊂B⊂XandA∈F,thenB∈FE3

iiiØ∉FE4

The cofinite filter C is defined as

C=x∣x⊂XandX−xis finiteE5

where X is an infinite set. C is called the cofinite filter because a subset x of X gets to be in the filter C if and only if X without x is a finite set. C gives a mathematical way to define whether an infinite set is considered big enough.

For instance, if C is the cofinite filter on R, the real numbers, the set of all integers ℤ is not big enough to be in C, even though it is an infinite subset of R, because there are infinitely many real numbers that are not integers. However, R∗, the real numbers excluding zero, is big enough to be a member of C, because there is only one real number, zero, that is not in the real numbers excluding zero.

An ultrafilter is the biggest filter on a given infinite set X. An ultrafilter that has C as a subset is called a free ultrafilter.

4.2 Equivalence classes of Rℕ: Classifying equivalent sequences together

Let Rℕ represent the set of all sequences with domain ℕ and range values in R. Let A and B be two sequences in Rℕ. A is said to be equivalent to B A=UB if a sufficiently large number of their elements match, or

A=UB⇔n∣An=Bn=S∈UE6

The free ultrafilter U determines whether the set of matching elements is big enough.

This relation =U is an equivalence relation on Rℕ, so it can partition Rℕ into equivalence classes. Each equivalence class A contains all the sequences in Rℕ that are equivalent to A, including A itself.

The set of all these equivalence classes is called the set of the hyperreal numbers, denoted ∗R.

4.3 Connecting the real numbers to the hyperreals

We can define a function f that takes each x∈R and gives the unique R, where nRn=x∈U. This function f assigns to each real number x a hyperreal number R, namely that set of all sequences where a sufficiently large number of each sequence’s elements is x. Often, fx is represented by ∗x. For instance, the hyperreal ∗3 is the set of all sequences equivalent (=U) to 3,3,3….

Most applications of math use real numbers, so it is helpful to define the subset of the hyperreals that corresponds to the real numbers. The image of a subset X of R under f is denoted σX. Each hyperreal number ∗x in σX corresponds to a real number x in X. Since R is a subset of R, σR is the subset of the hyperreals that corresponds to the real numbers.

4.4 Operations on the hyperreals

In order for algebra in ∗R to replace algebra in the real numbers, operations like + and ⋅, among others, have to be defined between members of ∗R. It is also useful to define the relation ≤ and the absolute value function.

Let a, b, and c be elements of ∗R, and let ∗+:∗R→∗R be defined as

a∗+b=c⇔nAn+Bn=Cn∈UE7

for any An∈a, Bn∈b, and Cn∈c. That is, the sum of 2 elements of ∗R, a and b, are equal to another element of ∗R, c, if and only if a sufficiently large number of the elements of the sequences An+Bn and Cn match, for any sequence An in a, Bn in b, and Cn in c. Hyperreal multiplication (∗⋅) can be defined similarly.

To construct a hyperreal greater than relation, for each a=A,b=B∈∗R define

a∗≤b⇔nAn≤Bn∈UE8

a∗≤b if and only if, given any sequence in a and any sequence in b, a sufficiently large number of elements in a‘s sequence are less than or equal to their corresponding elements in b‘s sequence.

These operations establish the structure ∗R,∗+,∗⋅,∗≤ as a totally ordered field, with 0 as the identity for ∗+ and 1 as the identity for ∗⋅ ([6], p. 11).

Finally, the absolute value function can be defined for members of a∈∗R with

∗∣a∣=∣a∣=b⇔nAn=Bn∈UE9

The absolute value of a hyperreal number a is a hyperreal number b if and only if, given a sequence in a and a sequence in b, a sufficiently large number of elements in b‘s sequence match the absolute value of their corresponding elements in a‘s sequence.

In summary, +, ⋅, ≤ and the absolute value function, which are defined on the real numbers, can be translated to operations on the hyperreal numbers.

4.5 Infinitesimals in the hyperreals

Not all of the members of ∗R correspond to real numbers, because not all sequences of real numbers are constant sequences. Some of the remaining hyperreals correspond to infinitesimals.

A hyperreal number a is infinitely large if

∗x<∣a∣for each∗x∈σRE10

or in other words, if its absolute value is bigger than every hyperreal that corresponds to a real number.

A hyperreal number b is an infinitesimal or as Newton stated infinitely small if

0≤∣b∣<∗xfor each0<x∈R.E11

Similarly, a hyperreal is an infinitesimal if its absolute value is bigger than or equal to ∗0 and yet smaller than every hyperreal that corresponds to a positive real number.

Notice that ∗0, which is the equivalence class that contains 0,0,0…, is the trivial infinitesimal.

For a nontrivial example of an infinitesimal, consider the equivalence class g containing the sequence 01121314…. “Then g≠∗0. Now for each x∈R+ there is some m∈ℕ, m≠0 such that 0<1m<x. Thus ∗0<∗1∗m<∗x. ... [and] g is an infinitesimal” ([6], p. 17).

4.6 Division with infinitesimals

If infinitesimals are smaller than every real number, can you still divide by them?

Consider a nonzero infinitesimal, say ε, and a sequence in ε, say A. Even if some of A‘s elements are zeros, ε≠∗0, so the set of all zeros in A is not big enough to be in the ultrafilter U. So, the nonzero elements of A are in U, since U is an ultrafilter. It is then possible to define another sequence B where Bn=1An if An≠0 and Bn=0 if An=0. B satisfies the property A∗⋅B=1, and so B is the multiplicative inverse of A.

In summary, even if there are sequences in ε with zeros, 1ε is still defined, and so it is still possible to divide by ε ([6], p. 11).

4.7 The standard and principal part functions

Hyperreal expressions can be converted into real expressions using the standard part function, st, which yields the closest real number to the hyperreal expression. The standard part of an infinitesimal number is always zero. For infinite values, the standard part yields +∞ or −∞, which is the non-specific infinity indicating that the value is out of range of the real numbers.

The principal part function, pt, will yield the most significant component of a hyperreal expression [7]. In a hyperreal expression, imagine ω representing a benchmark infinite value, with ε=1ω representing an associated benchmark infinitesimal. The hyperreal expression −2ω2+ω−5+3ε represents four different orders of infinity. The most significant one is −2ω2, and, thus, it is the principal part. For the infinitesimal expression 5ε2+ε3, 5ε2 is the principal part.

The principal part of a hyperreal expression is important because non-principal parts, being infinitely less significant than the principal part by definition, do not affect the large-scale behaviors of smooth and continuous functions.

4.8 Differentials and derivatives using hyperreals

The derivative of a function y=fx using the hyperreals is denoted dydx, the change in y divided by the change in x, just like using Leibniz’s notation. However, we can actually define the differentials themselves as infinitesimals, without referring to ratios.

Many have a hard time conceiving of just what a differential is and means. It is easy enough to say that a differential is an infinitesimal, but how exactly are individual differentials defined, especially when not being examined in the context of a derivative? What exactly does the higher-order notation d2y mean?

Let us first remember that, in order to be in a relation, two (or more) variables have to be related to each other in some way. Therefore, we can imagine some variable, let us call it q, not explicitly mentioned in the equation, which is in some sense the “ultimate” independent variable.

Note that this variable does not need to be explicitly defined. In fact, it is better if it is not defined explicitly. The reason for this is that defining q explicitly means that there is some chance that there exists yet another deeper, more fundamental variable. What we are looking for is the deepest, most fundamental, most independent variable. Keeping q as a hypothetical independent variable means that our reasoning will continue to hold in the face of finding more and more fundamental quantities. Our reasoning about an actual variable may fail to hold if it is found to not be the fundamental quantity. We will imagine q to be smoothly increasing by the infinitesimal ε.

Since q is the ultimate variable that relates every other variable in the equation, every variable can (theoretically) be written in terms of q. y is actually shorthand for yq, x is a shorthand for xq, and so on. We can then define the differential of an expression (including just a variable) to be the simple difference between the expression at some value q+ε and the expression at some value q. When taking the differential of a variable, we will use the shorthand dy to mean dy.

dy=dy=yq+ε−yqE12

Note that dy is also a function of q (this fact will become useful when finding the second differential). Additionally, assuming that y is a smooth and continuous function of q, an infinitesimal change in q will lead to an infinitesimal change in in y, so dy will also be infinitesimal.

We can also rearrange (12) and obtain

yq+ε=yq+dyE13

These definitions provide a generic definition for the differential and consequent manipulation techniques that can be applied to any expression. Let us take the simple example y=x2 (which is yq=xq2) and apply this differential operator to it. We will also apply the principal part function at the end in order to simplify the expression to its most consequential portion.

y=x2dy=dx2differential operatoryq+ε−yq=xq+ε2−xq2applying12dy=xq+dx2−xq2applying13dy=xq2+2xqdx+dx2−xq2simplifyingdy=2xqdx+dx2dy=2xqdxprincipal partdy=2xdxshorthand

The second differential is the same process. It is merely the differential operator applied where differentials are concerned. dy is actually dyq), but we will refer to it as dyq and dyq+ε for a compromise of brevity and clarity. The notation d2y will likewise be shorthand for ddyq.

dy=2xdxddy=d2xdxdifferential operator=2xq+εdxq+ε−2xqdxqapplying12=2xq+dxqdxq+ddxq−2xqdxqapplying13=2xqdxq+2xqddxqsimplifying+2dxq2+2dxddxq−2xqdxq=2xqddxq+2dxq2+2dxddxq=2xqddxq+2dxq2principal partd2y=2xd2x+2dx2shorthand

This second differential will typically be a second order infinitesimal. The process can be further repeated for higher order differentials.

The 2xd2x term here may be surprising, but the reason for it will become clear in Section 5 when we eliminate the contradictions present in the standard notation for higher-order differentials.

Since all variables in the equation are related to each other, they also share some relationship to q. Therefore, the definition of a differential can be defined universally within an equation without taking into account the specifics of the variables encountered.

Ultimately, taking the differential of a function results in a dy, dx, or some other term. However, these terms’ definitions are ultimately rooted in this ultimate independent variable q, and the results of incrementing it by some hyperreal infinitesimal ε.

The derivative, then, is simply a ratio of differentials defined in this way. While the terminology of “taking the derivative with respect to x” can still be used, there is no longer anything special about taking the derivative with respect to a variable as opposed to simply dividing by that variable’s differential. Additionally, this expands the ability to take total differentials straightforwardly into multivariable situations, providing that all variables can be, in principle, tied back to some underlying construct like q.

5. Extending the total derivative’s algebraic manipulability

The hyperreal definition of the derivative has several advantages. Once hyperreal numbers are defined, the definition of the derivative arises naturally from considering the change in a function when its (theoretical) independent variable changes infinitesimally. Unlike the limit definition, the change in y and the change x are separate entities. Using hyperreal numbers, we can rigorously define these entities so that they are manipulable using standard algebraic operators.

However, this requires that we rethink some of the notations from first principles. First of all, now that dy and dx are reified entities, they now must be considered in applying such rules as the product rule and the quotient rule. This is straightforward, and the rules are identical to normal calculus rules. The differential of x2dx is the result of applying the product rule to the product of x2 and dx, namely 2xdx2+x2d2x.

When this is taken into account, differentials of any order become algebraically manipulable.

5.1 The second derivative

Before taking this idea of algebraically manipulable differentials too far, we need to note that the standard notation for the second derivative, d2ydx2, does not work in this manner. The problem, here, is that it implies an improper order of operations [8].

Order of operations is very important when doing derivatives. When doing a derivative, one first takes the differential and then divides by dx. The second derivative is the derivative of the first, so the next differential occurs after the first derivative is complete, and the process finishes by dividing by dx again.

However, what does it look like to take the differential of the first derivative? Basic calculus rules tell us that the quotient rule should be used:

ddydx=dxddy−dyddxdx2=d2ydx−dydxd2xdx

Then, for the second step, this can be divided by dx, yielding:

ddydxdx=d2ydx2−dydxd2xdx2E14

This, in fact, yields a notation for the second derivative which is equally algebraically manipulable as the first derivative. It is not very pretty or compact, but it works algebraically.

The chain rule for the second derivative fits this algebraic notation correctly, provided we replace each instance of the second derivative with its full form (cf. (30)):

d2ydt2−dydtd2tdt2=d2ydx2−dydxd2xdx2dxdt2+dydxd2xdt2−dxdtd2tdt2E15

This in fact works out perfectly algebraically.³

5.2 Higher order derivatives

The notation for the third and higher derivatives can be found using the same techniques as for the second derivative. To find the third derivative of y with respect to x, one starts with the second derivative, takes the differential, and divides by dx:

dddydxdxdx=dd2ydx2−dydxd2xdx2dx=d3ydx3−dydxd3xdx3−3d2xdx2d2ydx2+3dydxd2x2dx4E17

Because the expanded notation for the second and higher derivatives is much more verbose than the first derivative, it is often useful for clarity and succinctness to write derivatives using a slight modification of Arbogast’s D notation (see [9]) for the total derivative instead of writing it as algebraic differentials. Here, we will also be subscripting the D with the variable with which the derivative is being taken with respect to and supplying in the superscript the number of derivatives we are taking. Therefore, where Arbogast would write simply D, this notation would be written as Dx1.

Below is the second and third derivative of y with respect to x written using both the enhanced Arbogast notation and as a ratio of differentials.

Dx2y=d2ydx2−dydxd2xdx2E18

Dx3y=d3ydx3−dydxd3xdx3−3d2xdx2d2ydx2+3dydxd2x2dx4E19

This gets even more important as the number of derivatives increases. Each one is more unwieldy than the previous one. However, each level can be converted to differential notation as follows:

Dxny=dDxn−1ydxE20

The advantage of Arbogast’s notation over Lagrangian notation are that this modification of Arbogast’s notation clearly specifies both the variable/expression whose derivative is being taken and the variable/expression it is being taken with respect to.

Therefore, when a compact representation of higher order derivatives is needed, this paper will use Arbogast’s notation for its clarity and succinctness. This notation can be easily expanded to its differentials when necessary for manipulation.

6. Extending the partial derivative’s algebraic manipulability

The derivative gives the rate at which a function f changes when x is increased. But what if f depends on both x and y? Imagine a hill where f is the distance above sea level, x is the distance east from the origin, and y is the distance north from the origin. To find how f is changing, a direction to measure the slope must be picked. Along the direction straight east, only x is changing while y stays constant. This slope is the partial derivative of fwith respect to x, denoted ∂f∂x, the change in f over the change in x when x is the only variable allowed to change ([5], pp. 940–941). A derivative where all the independent variables are allowed to change is called a total derivative, like the two-dimensional derivative dydx. This partial derivative can be formally defined using limits or using hyperreals.

Using limits, the partial derivative of fxy at the point ab with respect to x is limh→0fa+hb−f(abh ([5], p. 941). Likewise, the partial derivative of fxy with respect to x is limh→0fx+hy−fxyh. For more than two variables, the partial derivative of fx1x2… with respect to x1 is

∂f∂x1=limh→0fx1+hx2…−fx1x2…hE21

Like the with the total derivative, using limits to define the partial derivative means the change in f and the change in x are not defined separately and must be kept together. Using hyperreals, the partial derivative of f with respect to x1 is

∂f∂x1=fx1+dx1x2…−fx1x2…dx1E22

Also, dx1 can equal ∂x1 assuming both of them denote the smallest change in x1 possible. This is not an equation in the real numbers; it is an equation in the hyperreals.

Both the numerator and denominator of ∂f∂x1 have meaning on their own, and they both are specific hyperreals. So it should be possible to separate the fraction without problems.

However, the current notation for ∂f does not distinguish between the change in f when x1 is allowed to change and the change in f when another variable, say x2, is allowed to change. In other words, the ∂f in ∂f∂x1 is a different hyperreal from the ∂f in ∂f∂x2, even though they both use the exact same symbol. This can cause problems if the notation is taken seriously (see the contradiction noted in Section 2). Adding more information to the notation resolves this issue.

The notation for the partial derivative should be changed from ∂f∂x to ∂fxdx in order to preserve the information in the numerator when the fraction is separated.

This makes it clear that ∂ is an operator that takes as an argument not only f but also the choice of which variable to vary. The function that ∂ acts on, in this case f, is the first argument of ∂ and every argument after the first is a variable allowed to change. This can lead to expressions like ∂fxy, the change in f when both x and y are allowed to vary.

Using this notation, dfdt equals ∂fxdt+∂fydt, not dfdt+dfdt. The contradictions are resolved, and the partial derivative fraction can be separated. The numerator and denominator can be moved around just like any other algebraic expression, keeping in mind both of them are hyperreals, so technically any operations on them should be hyperreal operations.

Because the new notation can be algebraically manipulated without contradictions, it makes possible new equations where infinitesimals are not confined to ratios. For instance, the resolved contradiction proof gave the equation df=∂fx+∂fy. This is reminiscent of one of the conditions for differentiability, Δf=fxabΔx+fyabΔy+ε1Δx+ε2Δy, where for fixed a and b, ε1 and ε2 are functions that depend only on Δx and Δy, with ε1,ε2→00 as ΔxΔy→00 ([5], p. 947).

Besides simplifying old equations, with the new notation it is possible to consider individual partial changes when building equations, just like considering individual total changes.

The new notation can also denote expressions like ∂fx1x2, the change in fx1x2x3 when x1 and x2 are allowed to vary, but x3 must stay constant. With the current notation ∂f, dealing with these situations is clumsy at best.

∂fx1 is an infinitesimal with meaning on its own. It can be defined analogously to Eq. 12:

∂fx1=fx1+dx1x2…−fx1x2…E23

The total differential of f is usually defined as the combination of all of the changes in f depending on each variable. Typically, the total differential of a multivariate function is found using the sum of its partial derivatives multiplied by their respective differentials.

dfx1x2…=∂f∂x1dx1+∂f∂x2dx2+…E24

Using the new definition of the partial differential, we can rewrite the formula much more straightforwardly, where the total differential is simply a sum of its partial differentials.

dfx1x2…=∂fx1+∂fx2+…E25

7. Building differential formulas

Using the notation established in this paper, we can build standard calculus formulas in a clear, algebraic manner. The notation and the formulas will flow directly from the basic truths of calculus and the algebraic reasoning of differentials.

7.1 The inverse function theorem for second derivatives

The standard inverse function theorem simply states that dxdy=1dydx. In other words, as implied by the algebraic arrangement of its terms, the derivative of x with respect to y is simply the inverse of the derivative of y with respect to x. Using the hyperreal understanding of derivatives allows for a more straightforward way of considering this fact.

More importantly, the new notation for the second derivative likewise allows for a straightforward algebraic construction of an inverse function theorem for the second derivative. Since the second derivative of y with respect to x is Dx2y=d2ydx2−dydxd2xdx2, then the second derivative of x with respect to y will likewise be Dy2x=d2xdy2−dxdyd2ydy2. Is there a way to construct a formula for converting one to the other? A simple multiplication by −dxdy3 yields

−Dx2ydxdy3=d2xdy2−d2ydy2dxdy

Here, dxdy can be trivially recognized as 1Dx1y, and the right-hand side of the equation can be recognized as Dy2x. Therefore, this can be rewritten as

−Dx2y1Dx1y3=Dy2xE26

which is the inverse function theorem for the second derivative.

7.2 The chain rule for the second derivative

The chain rule for the second derivative can also be easily derived from the new notation. Starting with the notation for the second derivative of y with respect to x, we can look at the transformations needed to generate a second derivative of y with respect to t. We will start by multiplying by dx2dt2 in order to match the leading term to what is needed for the final result.

Dx2y=d2ydx2−dydxd2xdx2E27

Dx2yDt1x2=d2ydx2dx2dt2−dydxd2xdx2dx2dt2E28

Dx2yDt1x2=d2ydt2−dydxd2xdt2E29

In (29) we see that the leading term is what we want, but the second term is problematic. However, it looks a little like the leading term of the second derivative of x with respect to t multiplied by the first derivative of y with respect to t. Adding that combination to our existing result will yield the desired effect.

Dx2yDt1x2+Dx1yDt2x=d2ydt2−dydxd2xdt2+dydxd2xdt2−dydxdxdtd2tdt2E30

Dx2yDt1x2+Dx1yDt2x=d2ydt2−dydtd2tdt2E31

As is evident, the right-hand side is the desired result—the second derivative of y with respect to t.

7.3 The chain rule for multivariate derivatives

Building the chain rule for multivariate derivatives is even more straightforward. Consider a function fxy where x and y are both functions of t. As noted in (25), The total change in f, df, has two parts: the change due to x changing and the change due to y changing. So,

df=∂fx+∂fyE32

Dividing both sides by dt,

dfdt=∂fxdt+∂fydtE33

This is a valid equation, but it is difficult to calculate a value like ∂fxdt directly. To make it easier to work with, we can multiply the first term by dxdx and the second by dydy: ⁴

dfdt=∂fxdt⋅dxdx+∂fydt⋅dydyE34

=∂fxdx⋅dxdt+∂fydy⋅dydtE35

This is the standard chain rule for multivariate derivatives.

8. Conclusion

While treating derivatives as ratios of differentials has been long viewed as problematic, small changes in both the understanding and notation of derivatives straightforwardly leads to algebraically manipulable differentials for both total and partial differentials. These differentials provide a more straightforward basis for both doing calculus operations and deriving standard calculus rules. It eliminates exceptions and memorized formulas in favor of simply using algebra with differentials.

Our hope is that the flexibility and freedom of manipulability that this notation allows will both reduce the cognitive load for learning to use differential operators as well as allow for easier exploration of possibilities for practitioners.

Acknowledgments

The authors wish to thank Dr. Enrique Valderrama for his comments on early drafts of this manuscript.

References

1. Johnson WP. The curious history of fa‘a di Bruno’s formula. The American Mathematical Monthly. 2002;109(3):217-234. DOI: 10.1080/00029890.2002.11919857
2. Newton I. The Method of Fluxions and Infinite Series; with its Application to the Geometry of Curve-Lines, (Translated by John Colson). London: Henry Woodfall and John Nourse; 1736
3. Bell JL. Continuity and Infinitesimals. The Stanford Encyclopedia of Philosophy. 2022 ed. Stanford, CA: Spring, The Metaphysics Research LabPhilosophy Department Stanford University; 2022
4. Berkeley G. The Analyst: A Discourse Addressed to an Infidel Mathematician. London: J. and R. Tonson and S. Draper; 1734
5. Briggs W, Cochran L, Gillett B, Schulz E. Calculus: Early Transcendentals. 3rd ed. New York: Pearson Education; 2019
6. Herrmann RA. Nonstandard analysis: A simplified approach. arXiv. 2010;math/0310351v6: 1-82
7. Bartlett J, Gaastra L, Nemati D. Hyperreal numbers for infinite divergent series. Communications of the Blyth Institute. 2020;2(1):7-15. DOI: 10.33014/issn.2640-5652.2.1.bartlett-et-al.1
8. Bartlett J, Khurshudyan AZ. Extending the algebraic manipulability of differentials. Dynamics of Continuous, Discrete and Impulsive Systems Series A: Mathematical Analysis. 2019;26:217-230
9. Cajori F. A History of Mathematical Notations. Vol. II. Chicago: Open Court Publishing; 1929

Notes

A possible objection is that the ∂x in ∂f∂x may not be the same infinitesimal as the ∂x in ∂x∂t. However, the value of ∂f depends on the value of the ∂x in ∂f∂x, and the value of the ∂x in ∂x∂t depends on ∂t. So one could choose the ∂xs to be equal, and the values of ∂f and ∂t would adjust accordingly, leaving the values of ∂f∂x and ∂x∂t unchanged.
Some may be concerned that, in the formula presented in (14), the ratio d2xdx2 reduces to zero. However, this is not necessarily true. The concern is that, since dxdx is always 1 (i.e., a constant), then d2xdx2 should be zero. The problem with this concern is that we are no longer taking d2xdx2 to be the derivative of dxdx. Using the notation in (14), the derivative of dxdx would be: ddxdxdx=d2xdx2−dxdxd2xdx2
Technically, both dxdx and dydy equal [1], not 1. But, since this is an equation in the hyperreals (with hyperreal multiplication), multiplying by the hyperreal multiplication identity does not change the value of the right side of the equation.

[1] 1. Johnson WP. The curious history of fa‘a di Bruno’s formula. The American Mathematical Monthly. 2002;109(3):217-234. DOI: 10.1080/00029890.2002.11919857

[2] 2. Newton I. The Method of Fluxions and Infinite Series; with its Application to the Geometry of Curve-Lines, (Translated by John Colson). London: Henry Woodfall and John Nourse; 1736

[3] 3. Bell JL. Continuity and Infinitesimals. The Stanford Encyclopedia of Philosophy. 2022 ed. Stanford, CA: Spring, The Metaphysics Research LabPhilosophy Department Stanford University; 2022

[4] 4. Berkeley G. The Analyst: A Discourse Addressed to an Infidel Mathematician. London: J. and R. Tonson and S. Draper; 1734

[5] 5. Briggs W, Cochran L, Gillett B, Schulz E. Calculus: Early Transcendentals. 3rd ed. New York: Pearson Education; 2019

[6] 6. Herrmann RA. Nonstandard analysis: A simplified approach. arXiv. 2010;math/0310351v6: 1-82

[7] 7. Bartlett J, Gaastra L, Nemati D. Hyperreal numbers for infinite divergent series. Communications of the Blyth Institute. 2020;2(1):7-15. DOI: 10.33014/issn.2640-5652.2.1.bartlett-et-al.1

[8] 8. Bartlett J, Khurshudyan AZ. Extending the algebraic manipulability of differentials. Dynamics of Continuous, Discrete and Impulsive Systems Series A: Mathematical Analysis. 2019;26:217-230

[9] 9. Cajori F. A History of Mathematical Notations. Vol. II. Chicago: Open Court Publishing; 1929

Total and Partial Differentials as Algebraically Manipulable Entities

Operator Theory - Recent Advances, New Perspectives and Applications

Abstract

Keywords

Author Information

Maria Isabelle Fite

Jonathan Bartlett*

1. Introduction

2. Problem of separating differentials in modern Leibniz notation

3. Historical formal definitions of the derivative

3.1 Newton’s definition

3.2 Leibniz’s definition

3.3 Delta-epsilon (limit) definition

4. Hyperreal numbers and the definition of the derivative

4.1 Filters, the cofinite filter, and free ultrafilters: Defining big enough

4.2 Equivalence classes of Rℕ: Classifying equivalent sequences together

4.3 Connecting the real numbers to the hyperreals

4.4 Operations on the hyperreals

4.5 Infinitesimals in the hyperreals

4.6 Division with infinitesimals

4.7 The standard and principal part functions

4.8 Differentials and derivatives using hyperreals

5. Extending the total derivative’s algebraic manipulability

5.1 The second derivative

5.2 Higher order derivatives

6. Extending the partial derivative’s algebraic manipulability

7. Building differential formulas

7.1 The inverse function theorem for second derivatives

7.2 The chain rule for the second derivative

7.3 The chain rule for multivariate derivatives

8. Conclusion

Acknowledgments

References

Notes

Stabilization of a Quantum Equation under Boundary Connections with an Elastic Wave Equation

Total and Partial Differentials as Algebraically Manipulable Entities

Operator Theory - Recent Advances, New Perspectives and Applications

Abstract

Keywords

Author Information

Maria Isabelle Fite

Jonathan Bartlett*

1. Introduction

2. Problem of separating differentials in modern Leibniz notation

3. Historical formal definitions of the derivative

3.1 Newton’s definition

3.2 Leibniz’s definition

3.3 Delta-epsilon (limit) definition

4. Hyperreal numbers and the definition of the derivative

4.1 Filters, the cofinite filter, and free ultrafilters: Defining big enough

4.2 Equivalence classes of Rℕ: Classifying equivalent sequences together

4.3 Connecting the real numbers to the hyperreals

4.4 Operations on the hyperreals

4.5 Infinitesimals in the hyperreals

4.6 Division with infinitesimals

4.7 The standard and principal part functions

4.8 Differentials and derivatives using hyperreals

5. Extending the total derivative’s algebraic manipulability

5.1 The second derivative

5.2 Higher order derivatives

6. Extending the partial derivative’s algebraic manipulability

7. Building differential formulas

7.1 The inverse function theorem for second derivatives

7.2 The chain rule for the second derivative

7.3 The chain rule for multivariate derivatives

8. Conclusion

Acknowledgments

References

Notes

Continue reading from the same book

Operator Theory