Abstract
Differential operators usually result in derivatives expressed as a ratio of differentials. For all but the simplest derivatives, these ratios are typically not algebraically manipulable, but must be held together as a unit in order to prevent contradictions. However, this is primarily a notational and conceptual problem. The work of Abraham Robinson has shown that there is nothing contradictory about the concept of an infinitesimal differential operating in isolation. In order to make this system extend to all of calculus, however, some tweaks to standard calculus notation are required. Understanding differentials in this way actually provides a more straightforward understanding of all of calculus for students, and minimizes the number of specialized theorems students need to remember, since all terms can be freely manipulated algebraically.
Keywords
- differentials
- differential operators
- derivatives
- partial derivatives
- total derivatives
1. Introduction
Derivatives are usually written in a notation, such as
This led to a reconsideration of derivatives using the concept of a limit. In the limit definition of the derivative, the
However, the work of Abraham Robinson in the 1960s showed that there was no fundamental flaw in expanding the number system to include infinitesimals. The hyperreal numbers are an extension of the real numbers which allows for infinitesimals and infinities to be constructed in a manner equally rigorous with the real numbers. Additionally, unlike other conceptions of infinities, the hyperreal numbers have an additional advantage that infinitesimals and infinities can be manipulated using arithmetic and algebraic operations.
However, if infinitesimals can be readily considered without contradiction, why does the notation for derivative operations often lead to contradiction? The flaw here is actually in the notation itself. Because the notation was not considered factual but merely suggestive, practitioners tended to ignore the problematic cases rather than solve them. By considering new and more rigorous approaches to notation, a better notation can be developed which includes infinitesimal values, removes the contradictions, and provides a more straightforward understanding of differential notation and formulas. In these new formulations, differentials such as
2. Problem of separating differentials in modern Leibniz notation
While the problems that occur when trying to separate differentials in modern Leibniz notation are well-known, it is worth revisiting them briefly. First of all, it is interesting to note that there are essentially no inconsistencies or contradictions when dealing with first-order total differentials. For instance, taking the equation
The problems become more apparent on higher-order derivatives. The typical notation for the second derivative of
Dealing with partial derivatives brings up innumerable problematic cases even for the first derivative. If
As will be described, the issues in these problematic cases stem from deficiencies in the notation, not deficiencies in the concept of differentials as infinitesimals nor in the idea that differentials can be considered independently of each other. By taking a more rigorous approach to the development of the notation of higher order derivatives and partial derivatives, a straightforward notation can be obtained which enables differentials to be considered as fully distinct values.
3. Historical formal definitions of the derivative
The derivative of a function measures how the function changes as the independent variable varies. For instance, if the derivative of a function
Normally, slope is defined with reference to two points. When measuring velocity, for instance, which is the ratio of the change in position to the change in time, one would measure two different times with their positions and compare them. The derivative attempts to calculate the slope using only one point together with an equation. Since only one point is used, the change in
3.1 Newton’s definition
Isaac Newton provided one of the first definitions of a derivative in his book
To avoid having to define an infinitely small quantity, Newton worked with full derivatives, ratios of infinitesimals. Since Newton assumed all his variables depended on time, he could then switch out the infinitesimal change in
3.2 Leibniz’s definition
Unlike Newton, Gottfried Leibniz preferred to consider the change in
Although his calculus relied on the concept of an infinitesimal, Leibniz regarded infinitesimals as only “purely ideal entities... useful fictions, introduced to shorten arguments and aid insight” [3]. However, Leibniz was never able to rigorously define his infinitesimals nor how they behaved. Therefore, while they seemed to work well, the lack of clarity caused some skeptics to regarded them with suspicion, ridiculing them as “ghosts of departed quantities” [4].
3.3 Delta-epsilon (limit) definition
Concerns about the fishy nature of infinitesimals, treated like nonzero numbers when dividing but also like zero when adding, led to the reformulation of calculus using the idea of limits. The limit of
More precisely, the limit of
Limits can then be used to define the derivative of a function
When limits are used to define a derivative, it makes no sense to pull apart the change in
4. Hyperreal numbers and the definition of the derivative
While the limit definition of a derivative solves the philosophical problems of infinitesimals, it does not allow the change in
While there are different ways to construct hyperreal numbers, the approach we will take here is based on the set theory approach described by Herrmann in [6], with many of the definitions taken from there as well. We will begin by describing hyperreal numbers (including infinitesimals), and then describe the differential operator as being an operator that can be applied using infinitesimals.
For defining the infinitesimals, the core idea is to take the set of all infinitely long sequences of real numbers, denoted
4.1 Filters, the cofinite filter, and free ultrafilters: Defining big enough
A
Let
The
where
For instance, if
An
4.2 Equivalence classes of R ℕ : Classifying equivalent sequences together
Let
The free ultrafilter
This relation
The set of all these equivalence classes is called the set of the hyperreal numbers, denoted
4.3 Connecting the real numbers to the hyperreals
We can define a function
Most applications of math use real numbers, so it is helpful to define the subset of the hyperreals that corresponds to the real numbers. The image of a subset
4.4 Operations on the hyperreals
In order for algebra in
Let
for any
To construct a hyperreal greater than relation, for each
These operations establish the structure
Finally, the absolute value function can be defined for members of
The absolute value of a hyperreal number
In summary,
4.5 Infinitesimals in the hyperreals
Not all of the members of
A hyperreal number
or in other words, if its absolute value is bigger than every hyperreal that corresponds to a real number.
A hyperreal number
Similarly, a hyperreal is an infinitesimal if its absolute value is bigger than or equal to
Notice that
For a nontrivial example of an infinitesimal, consider the equivalence class
4.6 Division with infinitesimals
If infinitesimals are smaller than every real number, can you still divide by them?
Consider a nonzero infinitesimal, say
In summary, even if there are sequences in
4.7 The standard and principal part functions
Hyperreal expressions can be converted into real expressions using the standard part function,
The principal part function,
The principal part of a hyperreal expression is important because non-principal parts, being infinitely less significant than the principal part by definition, do not affect the large-scale behaviors of smooth and continuous functions.
4.8 Differentials and derivatives using hyperreals
The derivative of a function
Many have a hard time conceiving of just what a differential is and means. It is easy enough to say that a differential is an infinitesimal, but how exactly are individual differentials defined, especially when not being examined in the context of a derivative? What exactly does the higher-order notation
Let us first remember that, in order to be in a relation, two (or more) variables have to be related to each other in some way. Therefore, we can imagine some variable, let us call it
Note that this variable does not need to be explicitly defined. In fact, it is better if it is not defined explicitly. The reason for this is that defining
Since
Note that
We can also rearrange (12) and obtain
These definitions provide a generic definition for the differential and consequent manipulation techniques that can be applied to any expression. Let us take the simple example
The second differential is the same process. It is merely the differential operator applied where differentials are concerned.
This second differential will typically be a second order infinitesimal. The process can be further repeated for higher order differentials.
The
Since all variables in the equation are related to each other, they also share some relationship to
Ultimately, taking the differential of a function results in a
The derivative, then, is simply a ratio of differentials defined in this way. While the terminology of “taking the derivative with respect to
5. Extending the total derivative’s algebraic manipulability
The hyperreal definition of the derivative has several advantages. Once hyperreal numbers are defined, the definition of the derivative arises naturally from considering the change in a function when its (theoretical) independent variable changes infinitesimally. Unlike the limit definition, the change in
However, this requires that we rethink some of the notations from first principles. First of all, now that
When this is taken into account, differentials of any order become algebraically manipulable.
5.1 The second derivative
Before taking this idea of algebraically manipulable differentials too far, we need to note that the standard notation for the second derivative,
Order of operations is very important when doing derivatives. When doing a derivative, one
However, what does it look like to take the differential of the first derivative? Basic calculus rules tell us that the quotient rule should be used:
Then, for the second step, this can be divided by
This, in fact, yields a notation for the second derivative which is equally algebraically manipulable as the first derivative. It is not very pretty or compact, but it works algebraically.
The chain rule for the second derivative fits this algebraic notation correctly, provided we replace each instance of the second derivative with its full form (cf. (30)):
This in fact works out perfectly algebraically.3
5.2 Higher order derivatives
The notation for the third and higher derivatives can be found using the same techniques as for the second derivative. To find the third derivative of
Because the expanded notation for the second and higher derivatives is much more verbose than the first derivative, it is often useful for clarity and succinctness to write derivatives using a slight modification of Arbogast’s
Below is the second and third derivative of
This gets even more important as the number of derivatives increases. Each one is more unwieldy than the previous one. However, each level can be converted to differential notation as follows:
The advantage of Arbogast’s notation over Lagrangian notation are that this modification of Arbogast’s notation clearly specifies both the variable/expression whose derivative is being taken and the variable/expression it is being taken with respect to.
Therefore, when a compact representation of higher order derivatives is needed, this paper will use Arbogast’s notation for its clarity and succinctness. This notation can be easily expanded to its differentials when necessary for manipulation.
6. Extending the partial derivative’s algebraic manipulability
The derivative gives the rate at which a function
Using limits, the partial derivative of
Like the with the total derivative, using limits to define the partial derivative means the change in
Also,
Both the numerator and denominator of
However, the current notation for
The notation for the partial derivative should be changed from
This makes it clear that
Using this notation,
Because the new notation can be algebraically manipulated without contradictions, it makes possible new equations where infinitesimals are not confined to ratios. For instance, the resolved contradiction proof gave the equation
Besides simplifying old equations, with the new notation it is possible to consider individual partial changes when building equations, just like considering individual total changes.
The new notation can also denote expressions like
The total differential of
Using the new definition of the partial differential, we can rewrite the formula much more straightforwardly, where the total differential is simply a sum of its partial differentials.
7. Building differential formulas
Using the notation established in this paper, we can build standard calculus formulas in a clear, algebraic manner. The notation and the formulas will flow directly from the basic truths of calculus and the algebraic reasoning of differentials.
7.1 The inverse function theorem for second derivatives
The standard inverse function theorem simply states that
More importantly, the new notation for the second derivative likewise allows for a straightforward algebraic construction of an inverse function theorem for the second derivative. Since the second derivative of
Here,
which is the inverse function theorem for the second derivative.
7.2 The chain rule for the second derivative
The chain rule for the second derivative can also be easily derived from the new notation. Starting with the notation for the second derivative of
In (29) we see that the leading term is what we want, but the second term is problematic. However, it looks a little like the leading term of the second derivative of
As is evident, the right-hand side is the desired result—the second derivative of
7.3 The chain rule for multivariate derivatives
Building the chain rule for multivariate derivatives is even more straightforward. Consider a function
Dividing both sides by
This is a valid equation, but it is difficult to calculate a value like
This is the standard chain rule for multivariate derivatives.
8. Conclusion
While treating derivatives as ratios of differentials has been long viewed as problematic, small changes in both the understanding and notation of derivatives straightforwardly leads to algebraically manipulable differentials for both total and partial differentials. These differentials provide a more straightforward basis for both doing calculus operations and deriving standard calculus rules. It eliminates exceptions and memorized formulas in favor of simply using algebra with differentials.
Our hope is that the flexibility and freedom of manipulability that this notation allows will both reduce the cognitive load for learning to use differential operators as well as allow for easier exploration of possibilities for practitioners.
Acknowledgments
The authors wish to thank Dr. Enrique Valderrama for his comments on early drafts of this manuscript.
References
- 1.
Johnson WP. The curious history of fa‘a di Bruno’s formula. The American Mathematical Monthly. 2002; 109 (3):217-234. DOI: 10.1080/00029890.2002.11919857 - 2.
Newton I. The Method of Fluxions and Infinite Series; with its Application to the Geometry of Curve-Lines, (Translated by John Colson). London: Henry Woodfall and John Nourse; 1736 - 3.
Bell JL. Continuity and Infinitesimals. The Stanford Encyclopedia of Philosophy. 2022 ed. Stanford, CA: Spring, The Metaphysics Research LabPhilosophy Department Stanford University; 2022 - 4.
Berkeley G. The Analyst: A Discourse Addressed to an Infidel Mathematician. London: J. and R. Tonson and S. Draper; 1734 - 5.
Briggs W, Cochran L, Gillett B, Schulz E. Calculus: Early Transcendentals. 3rd ed. New York: Pearson Education; 2019 - 6.
Herrmann RA. Nonstandard analysis: A simplified approach. arXiv. 2010; math/0310351v6 : 1-82 - 7.
Bartlett J, Gaastra L, Nemati D. Hyperreal numbers for infinite divergent series. Communications of the Blyth Institute. 2020; 2 (1):7-15. DOI: 10.33014/issn.2640-5652.2.1.bartlett-et-al.1 - 8.
Bartlett J, Khurshudyan AZ. Extending the algebraic manipulability of differentials. Dynamics of Continuous, Discrete and Impulsive Systems Series A: Mathematical Analysis. 2019; 26 :217-230 - 9.
Cajori F. A History of Mathematical Notations. Vol. II. Chicago: Open Court Publishing; 1929
Notes
- A possible objection is that the ∂x in ∂f∂x may not be the same infinitesimal as the ∂x in ∂x∂t. However, the value of ∂f depends on the value of the ∂x in ∂f∂x, and the value of the ∂x in ∂x∂t depends on ∂t. So one could choose the ∂xs to be equal, and the values of ∂f and ∂t would adjust accordingly, leaving the values of ∂f∂x and ∂x∂t unchanged.
- Some may be concerned that, in the formula presented in (14), the ratio d2xdx2 reduces to zero. However, this is not necessarily true. The concern is that, since dxdx is always 1 (i.e., a constant), then d2xdx2 should be zero. The problem with this concern is that we are no longer taking d2xdx2 to be the derivative of dxdx. Using the notation in (14), the derivative of dxdx would be: ddxdxdx=d2xdx2−dxdxd2xdx2
- Technically, both dxdx and dydy equal [1], not 1. But, since this is an equation in the hyperreals (with hyperreal multiplication), multiplying by the hyperreal multiplication identity does not change the value of the right side of the equation.