Jump to content

Finalizer

From Wikipedia, the free encyclopedia
(Redirected from Finalization)

In computer science, a finalizer or finalize method is a special method that performs finalization, generally some form of cleanup. A finalizer is executed during object destruction, prior to the object being deallocated, and is complementary to an initializer, which is executed during object creation, following allocation. Finalizers are strongly discouraged by some, due to difficulty in proper use and the complexity they add, and alternatives are suggested instead, mainly the dispose pattern[1] (see problems with finalizers).

The term finalizer is mostly used in object-oriented and functional programming languages that use garbage collection, of which the archetype is Smalltalk. This is contrasted with a destructor, which is a method called for finalization in languages with deterministic object lifetimes, archetypically C++.[2][3] These are generally exclusive: a language will have either finalizers (if automatically garbage collected) or destructors (if manually memory managed), but in rare cases a language may have both, as in C++/CLI and D, and in case of reference counting (instead of tracing garbage collection), terminology varies. In technical use, finalizer may also be used to refer to destructors, as these also perform finalization, and some subtler distinctions are drawn – see terminology. The term final also indicates a class that cannot be inherited; this is unrelated.

Terminology

[edit]

The terminology of finalizer and finalization versus destructor and destruction varies between authors and is sometimes unclear.

In common use, a destructor is a method called deterministically on object destruction, and the archetype is C++ destructors; while a finalizer is called non-deterministically by the garbage collector, and the archetype is Java finalize methods.

For languages that implement garbage collection via reference counting, terminology varies, with some languages such as Objective-C and Perl using destructor, and other languages such as Python using finalizer (per spec, Python is garbage collected, but the reference CPython implementation since its version 2.0 uses a combination of reference counting and garbage collection). This reflects that reference counting results in semi-deterministic object lifetime: for objects that are not part of a cycle, objects are destroyed deterministically when the reference count drops to zero, but objects that are part of a cycle are destroyed non-deterministically, as part of a separate form of garbage collection.

In certain narrow technical use, constructor and destructor are language-level terms, meaning methods defined in a class, while initializer and finalizer are implementation-level terms, meaning methods called during object creation or destruction. Thus for example the original specification for the C# language referred to "destructors", even though C# is garbage-collected, but the specification for the Common Language Infrastructure (CLI), and the implementation of its runtime environment as the Common Language Runtime (CLR), referred to "finalizers". This is reflected in the C# language committee's notes, which read in part: "The C# compiler compiles destructors to ... [probably] instance finalizer[s]".[4][5] This terminology is confusing, and thus more recent versions of the C# spec refer to the language-level method as "finalizers".[6]

Another language that does not make this terminology distinction is D. Although D classes are garbage collected, their cleanup functions are called destructors.[7]

Use

[edit]

Finalization is mostly used for cleanup, to release memory or other resources: to deallocate memory allocated via manual memory management; to clear references if reference counting is used (decrement reference counts); to release resources, particularly in the resource acquisition is initialization (RAII) idiom; or to unregister an object. The amount of finalization varies significantly between languages, from extensive finalization in C++, which has manual memory management, reference counting, and deterministic object lifetimes; to often no finalization in Java, which has non-deterministic object lifetimes and is often implemented with a tracing garbage collector. It is also possible for there to be little or no explicit (user-specified) finalization, but significant implicit finalization, performed by the compiler, interpreter, or runtime; this is common in case of automatic reference counting, as in the CPython reference implementation of Python, or in Automatic Reference Counting in Apple's implementation of Objective-C, which both automatically break references during finalization. A finalizer can include arbitrary code; a particularly complex use is to automatically return the object to an object pool.

Memory deallocation during finalization is common in languages like C++ where manual memory management is standard, but also occurs in managed languages when memory has been allocated outside of the managed heap (externally to the language); in Java this occurs with Java Native Interface (JNI) and ByteBuffer objects in New I/O (NIO). This latter can cause problems due to the garbage collector being unable to track these external resources, so they will not be collected aggressively enough, and can cause out-of-memory errors due to exhausting unmanaged memory – this can be avoided by treating native memory as a resource and using the dispose pattern, as discussed below.

Finalizers are generally both much less necessary and much less used than destructors. They are much less necessary because garbage collection automates memory management, and much less used because they are not generally executed deterministically – they may not be called in a timely manner, or even at all, and the execution environment cannot be predicted – and thus any cleanup that must be done in a deterministic way must instead be done by some other method, most frequently manually via the dispose pattern. Notably, both Java and Python do not guarantee that finalizers will ever be called, and thus they cannot be relied on for cleanup.

Due to the lack of programmer control over their execution, it is usually recommended to avoid finalizers for any but the most trivial operations. In particular, operations often performed in destructors are not usually appropriate for finalizers. A common anti-pattern is to write finalizers as if they were destructors, which is both unnecessary and ineffectual, due to differences between finalizers and destructors. This is particularly common among C++ programmers, as destructors are heavily used in idiomatic C++, following the resource acquisition is initialization (RAII) idiom.

Syntax

[edit]

Programming languages that use finalizers include C++/CLI, C#, Clean, Go, Java, JavaScript and Python. Syntax varies significantly by language.

In Java, a finalizer is a method called finalize, which overrides the Object.finalize method.[8]

In JavaScript, FinalizationRegistry allows you to request a callback when an object is garbage-collected.

In Python, a finalizer is a method called __del__.

In Perl, a finalizer is a method called DESTROY.

UML class in C# containing a constructor and a finalizer.

In C#, a finalizer (called "destructor" in earlier versions of the standard) is a method whose name is the class name with ~ prefixed, as in ~Foo – this is the same syntax as a C++ destructor, and these methods were originally called "destructors", by analogy with C++, despite having different behavior, but were renamed to "finalizers" due to the confusion this caused.[6]

In C++/CLI, which has both destructors and finalizers, a destructor is a method whose name is the class name with ~ prefixed, as in ~Foo (as in C#), and a finalizer is a method whose name is the class name with ! prefixed, as in !Foo.

In Go finalizers are applied to a single pointer by calling the runtime.SetFinalizer function in the standard library.[9]

Implementation

[edit]

A finalizer is called when an object is garbage collected – after an object has become garbage (unreachable), but before its memory is deallocated. Finalization occurs non-deterministically, at the discretion of the garbage collector, and might never occur. This contrasts with destructors, which are called deterministically as soon as an object is no longer in use, and are always called, except in case of uncontrolled program termination. Finalizers are most frequently instance methods, due to needing to do object-specific operations.

The garbage collector must also account for the possibility of object resurrection. Most commonly this is done by first executing finalizers, then checking whether any objects have been resurrected, and if so, aborting their destruction. This additional check is potentially expensive – a simple implementation re-checks all garbage if even a single object has a finalizer – and thus both slows down and complicates garbage collection. For this reason, objects with finalizers may be collected less frequently than objects without finalizers (only on certain cycles), exacerbating problems caused by relying on prompt finalization, such as resource leaks.

If an object is resurrected, there is the further question of whether its finalizer is called again, when it is next destroyed – unlike destructors, finalizers are potentially called multiple times. If finalizers are called for resurrected objects, objects may repeatedly resurrect themselves and be indestructible; this occurs in the CPython implementation of Python prior to Python 3.4, and in CLR languages such as C#. To avoid this, in many languages, including Java, Objective-C (at least in recent Apple implementations), and Python from Python 3.4, objects are finalized at most once, which requires tracking if the object has been finalized yet.

In other cases, notably CLR languages like C#, finalization is tracked separately from the objects themselves, and objects can be repeatedly registered or deregistered for finalization.

Problems

[edit]

Depending on the implementation, finalizers can cause a significant number of problems, and are thus strongly discouraged by a number of authorities.[10][11] These problems include:[10]

  • Finalizers may not be called in a timely manner, or at all, so they cannot be relied upon to persist state, release scarce resources, or do anything else of importance.
  • Finalizers may result in object resurrection, which is often a programming error and whose very possibility significantly slows down and complicates garbage collection.
  • Finalizers are run based on garbage collection, which is generally based on managed memory pressure – they are not run in case of other resource scarcity, and thus are not suited for managing other scarce resources.
  • Finalizers do not run in a specified order, and cannot rely on class invariants (as they may refer to other objects that have already been finalized).
  • Slow finalizers may delay other finalizers.
  • Exceptions within finalizers generally cannot be handled, because the finalizer is run in an unspecified environment, and may be either ignored or cause uncontrolled program termination.
  • Finalizers may reference and accidentally finalize live objects, violating program invariants.
  • Finalizers may cause synchronization issues, even in otherwise sequential (single-threaded) programs, when finalization is done in a separate threads.[12]
  • Finalizers may cause deadlocks if synchronization mechanisms such as locks are used, due to not being run in a specified order and possibly being run concurrently.
  • Finalizers that are run during program termination cannot rely on the usual runtime environment, and thus may fail due to incorrect assumptions – for this reason finalizers are often not run during termination.

Further, finalizers may fail to run due to objects remaining reachable beyond when they are expected to be garbage, either due to programming errors or due to unexpected reachability. For example, when Python catches an exception (or an exception is not caught in interactive mode), it keeps a reference to the stack frame where the exception was raised, which keeps objects referenced from that stack frame alive.

In Java, finalizers in a superclass can also slow down garbage collection in a subclass, as the finalizer can potentially refer to fields in the subclass, and thus the field cannot be garbage collected until the following cycle, once the finalizer has run.[10] This can be avoided by using composition over inheritance.

Resource management

[edit]

A common anti-pattern is to use finalizers to release resources, by analogy with the resource acquisition is initialization (RAII) idiom of C++: acquire a resource in the initializer (constructor), and release it in the finalizer (destructor). This does not work, for a number of reasons. Most basically, finalizers may never be called, and even if called, may not be called in a timely manner – thus using finalizers to release resources will generally cause resource leaks. Further, finalizers are not called in a prescribed order, while resources often need to be released in a specific order, frequently the opposite order in which they were acquired. Also, as finalizers are called at the discretion of the garbage collector, they will often only be called under managed memory pressure (when there is little managed memory available), regardless of resource pressure – if scarce resources are held by garbage but there is plenty of managed memory available, garbage collection may not occur, thus not reclaiming these resources.

Thus instead of using finalizers for automatic resource management, in garbage-collected languages one instead must manually manage resources, generally by using the dispose pattern. In this case resources may still be acquired in the initializer, which is called explicitly on object instantiation, but are released in the dispose method. The dispose method may be called explicitly, or implicitly by language constructs such as C#'s using, Java's try-with-resources, or Python's with.

However, in certain cases both the dispose pattern and finalizers are used for releasing resources. This is mostly found in CLR languages such as C#, where finalization is used as a backup for disposal: when a resource is acquired, the acquiring object is queued for finalization so that the resource is released on object destruction, even if the resource is not released by manual disposal.

Deterministic and non-deterministic object lifetimes

[edit]

In languages with deterministic object lifetimes, notably C++, resource management is frequently done by tying resource possession lifetime to object lifetime, acquiring resources during initialization and releasing them during finalization; this is known as resource acquisition is initialization (RAII). This ensures that resource possession is a class invariant, and that resources are released promptly when the object is destroyed.

However, in languages with non-deterministic object lifetimes – which include all major languages with garbage collection, such as C#, Java, and Python – this does not work, because finalization may not be timely or may not happen at all, and thus resources may not be released for a long time or even at all, causing resource leaks. In these languages resources are instead generally managed manually via the dispose pattern: resources may still be acquired during initialization, but are released by calling a dispose method. Nevertheless, using finalization for releasing resources in these languages is a common anti-pattern, and forgetting to call dispose will still cause a resource leak.

In some cases both techniques are combined, using an explicit dispose method, but also releasing any still-held resources during finalization as a backup. This is commonly found in C#, and is implemented by registering an object for finalization whenever a resource is acquired, and suppressing finalization whenever a resource is released.

Object resurrection

[edit]

If user-specified finalizers are allowed, it is possible for finalization to cause object resurrection, as the finalizers can run arbitrary code, which may create references from live objects to objects being destroyed. For languages without garbage collection, this is a severe bug, and causes dangling references and memory safety violations; for languages with garbage collection, this is prevented by the garbage collector, most commonly by adding another step to garbage collection (after running all user-specified finalizers, check for resurrection), which complicates and slows down garbage collection.

Further, object resurrection means that an object may not be destroyed, and in pathological cases an object can always resurrect itself during finalization, making itself indestructible. To prevent this, some languages, like Java and Python (from Python 3.4) only finalize objects once, and do not finalize resurrected objects.[citation needed] Concretely this is done by tracking if an object has been finalized on an object-by-object basis. Objective-C also tracks finalization (at least in recent[when?] Apple versions[clarification needed]) for similar reasons, treating resurrection as a bug.

A different approach is used in the .NET Framework, notably C# and Visual Basic .NET, where finalization is tracked by a "queue", rather than by object. In this case, if a user-specified finalizer is provided, by default the object is only finalized once (it is queued for finalization on creation, and dequeued once it is finalized), but this can be changed via calling the GC module. Finalization can be prevented by calling GC.SuppressFinalize, which dequeues the object, or reactivated by calling GC.ReRegisterForFinalize, which enqueues the object. These are particularly used when using finalization for resource management as a supplement to the dispose pattern, or when implementing an object pool.

Contrast with initialization

[edit]

Finalization is formally complementary to initialization – initialization occurs at the start of lifetime, finalization at the end – but differs significantly in practice. Both variables and objects are initialized, mostly to assign values, but in general only objects are finalized, and in general there is no need to clear values – the memory can simply be deallocated and reclaimed by the operating system.

Beyond assigning initial values, initialization is mostly used to acquire resources or to register an object with some service (like an event handler). These actions have symmetric release or unregister actions, and these can symmetrically be handled in a finalizer, which is done in RAII. However, in many languages, notably those with garbage collection, object lifetime is asymmetric: object creation happens deterministically at some explicit point in the code, but object destruction happens non-deterministically, in some unspecified environment, at the discretion of the garbage collector. This asymmetry means that finalization cannot be effectively used as the complement of initialization, because it does not happen in a timely manner, in a specified order, or in a specified environment. The symmetry is partially restored by also disposing of the object at an explicit point, but in this case disposal and destruction do not happen at the same point, and an object may be in a "disposed but still alive" state, which weakens the class invariants and complicates use.

Variables are generally initialized at the start of their lifetime, but not finalized at the end of their lifetime – though if a variable has an object as its value, the object may be finalized. In some cases variables are also finalized: GCC extensions allow finalization of variables.

Connection with finally

[edit]

As reflected in the naming, "finalization" and the finally construct both fulfill similar purposes: performing some final action, generally cleaning up, after something else has finished. They differ in when they occur – a finally clause is executed when program execution leaves the body of the associated try clause – this occurs during stack unwind, and there is thus a stack of pending finally clauses, in order – while finalization occurs when an object is destroyed, which happens depending on the memory management method, and in general there is simply a set of objects awaiting finalization – often on the heap – which need not happen in any specific order.

However, in some cases these coincide. In C++, object destruction is deterministic, and the behavior of a finally clause can be produced by having a local variable with an object as its value, whose scope is a block corresponds to the body of a try clause – the object is finalized (destructed) when execution exits this scope, exactly as if there were a finally clause. For this reason, C++ does not have a finally construct – the difference being that finalization is defined in the class definition as the destructor method, rather than at the call site in a finally clause.

Conversely, in the case of a finally clause in a coroutine, like in a Python generator, the coroutine may never terminate – only ever yielding – and thus in ordinary execution the finally clause is never executed. If one interprets instances of a coroutine as objects, then the finally clause can be considered a finalizer of the object, and thus can be executed when the instance is garbage collected. In Python terminology, the definition of a coroutine is a generator function, while an instance of it is a generator iterator, and thus a finally clause in a generator function becomes a finalizer in generator iterators instantiated from this function.

History

[edit]

The notion of finalization as a separate step in object destruction dates to Montgomery (1994),[13] by analogy with the earlier distinction of initialization in object construction in Martin & Odell (1992).[14] Literature prior to this point used "destruction" for this process, not distinguishing finalization and deallocation, and programming languages dating to this period, like C++ and Perl, use the term "destruction". The terms "finalize" and "finalization" are also used in the influential book Design Patterns (1994).[a][15] The introduction of Java in 1995 contained finalize methods, which popularized the term and associated it with garbage collection, and languages from this point generally make this distinction and use the term "finalization", particularly in the context of garbage collection.

Notes

[edit]
  1. ^ Published 1994, with a 1995 copyright.

References

[edit]
  1. ^ Jagger, Perry & Sestoft 2007, p. 542, "In C++, a destructor is called in a determinate manner, whereas, in C#, a finalizer is not. To get determinate behavior from C#, one should use Dispose.
  2. ^ Boehm, Hans-J. (2002). Destructors, Finalizers, and Synchronization. Symposium on Principles of Programming Languages (POPL).
  3. ^ Jagger, Perry & Sestoft 2007, p. 542, C++ destructors versus C# finalizers C++ destructors are determinate in the sense that they are run at known points in time, in a known order, and from a known thread. They are thus semantically very different from C# finalizers, which are run at unknown points in time, in an unknown order, from an unknown thread, and at the discretion of the garbage collector.
  4. ^ In full: "We're going to use the term "destructor" for the member which executes when an instance is reclaimed. Classes can have destructors; structs can't. Unlike in C++, a destructor cannot be called explicitly. Destruction is non-deterministic – you can't reliably know when the destructor will execute, except to say that it executes at some point after all references to the object have been released. The destructors in an inheritance chain are called in order, from most descendant to least descendant. There is no need (and no way) for the derived class to explicitly call the base destructor. The C# compiler compiles destructors to the appropriate CLR representation. For this version that probably means an instance finalizer that is distinguished in metadata. CLR may provide static finalizers in the future; we do not see any barrier to C# using static finalizers.", May 12th, 1999.
  5. ^ What’s the difference between a destructor and a finalizer?, Eric Lippert, Eric Lippert’s Blog: Fabulous Adventures In Coding, 21 Jan 2010
  6. ^ a b Jagger, Perry & Sestoft 2007, p. 542, "In the previous version of this standard, what is now referred to as a “finalizer” was called a “destructor”. Experience has shown that the term “destructor” caused confusion and often resulted in incorrect expectations, especially to programmers knowing C++. In C++, a destructor is called in a determinate manner, whereas, in C#, a finalizer is not. To get determinate behavior from C#, one should use Dispose."
  7. ^ Class destructors Class destructors in D
  8. ^ java.lang, Class Object: finalize
  9. ^ "Runtime package - runtime - PKG.go.dev".
  10. ^ a b c "MET12-J. Do not use finalizers", Dhruv Mohindra, The CERT Oracle Secure Coding Standard for Java, 05. Methods (MET) Archived 2014-05-04 at the Wayback Machine
  11. ^ object.__del__(self), The Python Language Reference, 3. Data model: "... __del__() methods should do the absolute minimum needed to maintain external invariants."
  12. ^ Hans-J. Boehm, Finalization, Threads, and the Java™ Technology-Based Memory Model, JavaOne Conference, 2005.
  13. ^ Montgomery 1994, p. 120, "As with object instantiation, design for object termination can benefit from implementation of two operations for each class—a finalize and a terminate operation. A finalize operation breaks associations with other objects, ensuring data structure integrity."
  14. ^ Montgomery 1994, p. 119, "Consider implementing class instantiation as a create and initialize operation, as suggested by Martin and Odell. The first allocates storage for new objects, and the second constructs the object to adhere to specifications and constraints."
  15. ^ "Every new class has a fixed implementation overhead (initialization, finalization, etc.).", "destructor In C++, an operation that is automatically invoked to finalize an object that is about to be deleted."

Further reading

[edit]
[edit]