close-watcher

Introduction: A web API proposal for watching for close signals (e.g. Esc, Android back button, ...)
More: Author   ReportBugs   OfficialWebsite   
Tags:

The problem

Various UI components have a "modal" or "popup" behavior. For example:

  • a <dialog> element, especially the showModal() API;
  • a sidebar menu;
  • a lightbox;
  • a custom picker input (e.g. date picker);
  • a custom context menu;
  • fullscreen mode.

An important common feature of these components is that they are designed to be easy to close, with a uniform interaction mechanism for doing so. Typically, this is the Esc key on desktop platforms, and the back button on some mobile platforms (notably Android). Game consoles also tend to use a specific button as their "close/cancel/back" button. Finally, accessibility technology sometimes provides specific close signals for their users, e.g. iOS VoiceOver's "dismiss an alert or return to the previous screen" gesture.

We define a close signal as a platform-mediated interaction that's intended to close an in-page component. This is distinct from page-mediated interactions, such as clicking on an "x" or "Done" button, or clicking on the backdrop outside of the modal.

Currently, web developers have no good way to handle these close signals. This is especially problematic on Android devices, where the back button is the traditional close signal. Imagine a user filling in a twenty-field form, with the last item being a custom date picker modal. The user might click the back button hoping to close the date picker, like they would in a native app. But instead, the back button navigates the web page's history tree, likely closing the whole form and losing the filled information. But the problem of how to handle close signals extends across all operating systems; implementing a good experience for users of different browsers, platforms, and accessibility technologies requires a lot of user agent sniffing today.

This explainer proposes a new API to enable web developers, especially component authors, to better handle these close signals.

Goals

Our primary goals are as follows:

  • Allow web developers to intercept close signals (e.g. Esc on desktop, back button on Android, "z" gesture on iOS using VoiceOver) to close custom components that they create.

  • Discourage platform-specific code for handling modal close signals, by providing an API that abstracts away the differences.

  • Allow future platforms to introduce new close signals (e.g., a "swipe away" gesture), such that code written against the API proposed here automatically works on such platforms.

  • Prevent abuse that traps the user on a given history state by disabling the back button's ability to actually navigate backward in history.

  • Be usable by component authors, in particular by avoiding any global state modifications that might interfere with the app or with other components.

  • Specify uniform-per-platform behavior across existing platform-provided components, e.g. <dialog>, <input> pickers, and fullscreen. (For example: currently the back button on Android does not close modal <dialog>s, but instead navigates history. It does close <input> pickers and close fullscreen.)

  • Explain <dialog>'s existing close and cancel events in terms of this model.

The following is a goal we wish we could meet, but don't believe is possible to meet while also achieving our primary goals:

  • Avoid an awkward transition period, where the Android back button closes components on sites that adopt this new API, but navigates history on sites that haven't adopted it. In particular, right now users generally know the Android back button fails to close modals in web apps and avoid it when modals are open; we worry about this API causing a state where users are no longer sure which action it performs.

The following is a goal we think is probably desirable, but the additional complexity makes us want to pursue it in a future extension:

  • Allow the developer to confirm a close signal, in a similar fashion to beforeunload, to avoid potential data loss.

What developers are doing today

On desktop platforms, this problem is currently solved by listening for the keydown event, and closing the component when the Esc key is pressed. Built-in platform APIs, such as <dialog>, fullscreen, or <input type="date">, will do this automatically. Note that this can be easy to get wrong, e.g. by accidentally listening for keyup or keypress.

On mobile platforms, getting the right behavior is significantly harder. First, platform-specific code is required, to do nothing on iOS (or, perhaps, try to detect VoiceOver and then intercept its "z" gesture?), capture the back button on Android, and listen for gamepad inputs on PlayStation browser. Next, capturing the back button on Android requires manipulating the history list using the history API. This is a poor fit for several reasons:

  • This UI state is non-navigational, i.e., the URL doesn't and shouldn't change when opening a component. When the page is reloaded or shared, web developers generally don't want to automatically re-open the same component.

  • It's very hard to determine when the component's state is popped out of history, because both moving forward and backward through history emit the same popstate event, and that event can be emitted for history navigation that traverses more than one entry at a time.

  • The history.state API, used to store the component's state, is rather fragile: user-initiated fragment navigation, as well as other calls to history.pushState() and history.replaceState(), can override it.

  • History navigation is not cancelable. Implementing "Are you sure you want to close this dialog without saving?" requires letting the history navigation complete, then potentially re-establishing the dialog's state if the user declines to close the dialog.

  • A shared component that attempts to use the history API to implement these techniques can easily corrupt a web application's router.

Proposal

The proposal is to introduce a new API, the CloseWatcher class, which has the following basic API:

// Note: the constructor will throw a "NotAllowedError" DOMException, if used
// too many times without user interaction.
const watcher = new CloseWatcher();

// This fires when the user sends a close signal, e.g. by pressing Esc on
// desktop or by pressing Android's back button.
watcher.onclose = () => {
  myModal.close();
};

// You should destroy watchers which are no longer needed, e.g. if the
// modal closes normally. This will prevent future events on this watcher.
myModalCloseButton.onclick = () => {
  watcher.destroy();
  myModal.close();
};

If more than one CloseWatcher is active at a given time, then only the most-recently-constructed one gets events delivered to it. A watcher becomes inactive after a close event is delivered, or the watcher is explicitly destroy()ed.

Read on for more details and realistic usage examples.

User activation gating

It was mentioned above that the new CloseWatcher() constructor can throw if called too many times without user activation. Specifically, the design is that the page gets one "free" active CloseWatcher at a time. After that, any further CloseWatcher constructions require transient activation, i.e., the construction must happen as part of or shortly after a user interaction event like click, pointerup, or keydown.

The motivation for this is that, for platforms like Android where the modal close gesture is to use the back button, we need to prevent abuse that traps the user on a page by effectively disabling their back button. This restriction means that the back button can only be intercepted as many times as user activation was given to the document, plus one.

The allowance for a single non-activation-triggered CloseWatcher is to allow use cases like "session inactivity timeout" modals, or high-priority interrupt popups. The page can create a single one of these at a given time, which we believe strikes a good balance between meeting realistic use cases and preventing abuse.

Note that for developers, this means that calling watcher.destroy() properly is important, as doing so will free up the "free CloseWatcher slot", if it has been previously consumed.

Signaling close yourself

The API has an additional convenience method, watcher.signalClosed(), which acts as if a close signal had been sent by the user. The intended use case is to allow centralizing close-handling code. So the above example of

watcher.onclose = () => myModal.close();

myModalCloseButton.onclick = () => {
  watcher.destroy();
  myModal.close();
};

could be replaced by

watcher.onclose = () => myModal.close();

myModalCloseButton.onclick = () => watcher.signalClosed();

deduplicating the myModal.close() call by having the developer put all their close-handling logic into the watcher's close event handler.

As usual, reaching the close event will inactivate the CloseWatcher, meaning it receives no further events in the future and it no longer occupies the "free CloseWatcher slot", if it was previously doing so.

Abuse analysis

As discussed above, for platforms like Android where the close signal is to use the back button, we need to prevent abuse that traps the user on a page by effectively disabling their back button. The user activation gating is intended to combat that. Notably, that protection is already stronger than anything done today for the history.pushState() API, which is another means by which apps can attempt to trap the user on the page. See discussion below for more on that.

Additionally, we note that in most back button UIs, the user always has an escape hatch of holding down the back button and explicitly choosing a history step to navigate back to. This is never a close signal.

Realistic examples

The above sections give illustrative usage of the API. The following ones show how the API could be incorporated into realistic apps and UI components.

A sidebar

For a sidebar (e.g. behind a hamburger menu), which wants to hide itself on a user-provided close signal, that could be hooked up as follows:

const hamburgerMenuButton = document.querySelector('#hamburger-menu-button');
const sidebar = document.querySelector('#sidebar');

hamburgerMenuButton.addEventListener('click', () => {
  const watcher = new CloseWatcher();

  sidebar.animate([{ transform: 'translateX(-200px)' }, { transform: 'translateX(0)' }]);

  watcher.onclose = () => {
    sidebar.animate([{ transform: 'translateX(0)' }, { transform: 'translateX(-200px)' }]);
  };

  // Close on clicks outside the sidebar.
  document.body.addEventListener('click', e => {
    if (e.target.closest('#sidebar') === null) {
      watcher.signalClosed();
    }
  });
});

A picker

For a "picker" control that wants to close itself on a user-provided close signal, code like the following would work:

class MyPicker extends HTMLElement {
  #button;
  #overlay;
  #watcher;

  constructor() {
    super();
    this.#button = /* ... */;

    this.#overlay = /* ... */;
    this.#overlay.hidden = true;
    this.#overlay.querySelector('.close-button').addEventListener('click', () => {
      this.#watcher.signalClose();
    });

    this.#button.onclick = () => {
      this.overlay.hidden = false;

      this.#watcher = new CloseWatcher();
      this.#watcher.onclose = () => this.overlay.hidden = true;
    }
  }
}

Platform close signals

With CloseWatcher as a foundation, we can work to unify the web platform's existing and upcoming close signals:

Explaining <dialog>

The <dialog> spec today states that "user agents may provide a user interface that, upon activation, [cancels the dialog]". In particular, here canceling the dialog first fires a cancel event, and if the web developer does not call event.preventDefault(), it will close the dialog.

The existing <dialog> implementation in Chromium implements this, but only with with Esc key on desktop. That is, on Android Chromium, the system back button will not close <dialog>s. (Perhaps this is because of the fears about back button trapping mentioned above?)

Our proposal is to replace the vague specification sentence above with text based on close watchers. This has a number of benefits:

  • It allows the Android back button to close dialogs, subject to anti-abuse restrictions (e.g. the call to dialogEl.showModal() must be done with user activation or use up the free close watcher slot).

  • It makes it clear how <dialog>s interact with CloseWatcher instances: they both live in the same per-Document close watcher stack.

  • It drives interoperability in terms of user- and developer-facing <dialog> behavior by providing a more concrete specification, e.g. with regards to how the <dialog>'s cancel event is sequenced versus the keydown event for Esc keyp presses.

Integration with Fullscreen

The Fullscreen spec today states "If the end user instructs the user agent to end a fullscreen session initiated via requestFullscreen(), fully exit fullscreen". Existing Fullscreen implementations implement this using the Esc key on desktop, the back button on Android, and a floating software "x" button on iPadOS. (iOS on iPhones does not appear to implement the fullscreen API.)

We propose replacing this with explicit integration into the close signal steps. Again, this gives interoperability benefits by using a shared primitive, and a clear specification for how it interacts with <dialog>s, CloseWatchers, and key events.

Integration with <popup>

The <popup> proposal would benefit from similar integration as <dialog>. This is discussed in openui/open-ui#320.

Integration with <input>?

We could update the specification for <input> to mention that it should use close watchers when the user opens the input's picker UI. This kind of unification fits well with the goals of this project, but it might be tricky since the existing <input> specification is intentionally vague on how UI is presented.

Alternatives considered

Integration with the history API

Because of how the back button on Android is a close signal, one might think it natural to make handling close signals part of either the existing history API, or a revised history API. This idea has some precedent in mobile application frameworks that integrate modals into their "back stack".

However, on the web the history API is intimately tied to navigations, the URL bar, and application state. Using it for UI state is generally not great. See also the above discussion of how developers are forced to use the history API today for this purpose, and how poorly it works. In fact, we're hopeful that by tackling this proposal separately from the history API, other efforts to improve the history API will be able to focus on actual navigations, instead of on close signals.

Note that the line between "UI state" and "a navigation" can be blurry in single-page applications. For example, Twitter.com's logged-in view lets you type directly into a "What's happening?" text box in order to tweet, which we classify as UI state. But if you click the "Tweet" button on the sidebar, it navigates to a new URL which displays a lightbox into which you can input your tweet. In our taxonomy, this new-URL lightbox is a navigation, and it would not be suitable to use the CloseWatcher API for it, because closing it needs to update the URL back to what it was originally (i.e., navigate backwards in the history list).

Automatically translating all close signals to Esc

If we assume that developers already know to handle the Esc key to close their components, then we could potentially translate other close signals, like the Android back button, into Esc key presses. The hope is then that application and component developers wouldn't have to update their code at all: if they're doing the right think for that common desktop close signal, they would suddenly start doing the right thing on other platforms as well. This is especially attractive as it could help avoid the awkward transition period mentioned in the goals section.

However, upon reflection, such a solution doesn't really solve the general problem. Given an Android back button press, or a PlayStation square button press, or any other gesture which might serve multiple context-dependent purposes, the browser needs to know: should perform its usual action, or should it be translated to an Esc key press? For custom components, the only way to know is for the web developer to tell the browser that a close-signal-consuming component is open. So our goal of requiring no code modifications, or awkward transition period, is impossible. Given this, the strangeness of synthesizing fake Esc keypresses does not have much to recommend it.

Not gating on transient user activation

We're gating the creation of more than one CloseWatcher on transient user activation as an anti-abuse measure. However, this protection is stronger than the existing protections against excessive history.pushState() use, which are are more vague and less mandatory in that method's spec:

Optionally, return. (For example, the user agent might disallow calls to these methods that are invoked on a timer, or from event listeners that are not triggered in response to a clear user action, or that are invoked in rapid succession.)

We could gate the creation of CloseWatcher on similarly-vague and optional protections. This would allow more free use of it, especially on platforms that don't use the back button as a close signal and so don't need the abuse protection.

But on balance, we'd prefer to start with the transient user activation restriction (plus one "free" CloseWatcher), mainly for reasons of interoperability. Allowing platforms to differ in when CloseWatchers can be created would potentially create a race to the bottom, where nobody can be stricter than the most widely-used implementation.

A potentially-viable alternative would be to try to standardize the anti-abuse measures that browsers are currently doing for history.pushState(), and then use them to underlie close watchers as well. We're definitely open to this idea.

Bundling this with high-level APIs

The proposal here exposes CloseWatcher as a primitive. However, watching for close signals is only a small part of what makes UI components difficult. Some UI components that watch for close signals also need to deal with top layer interaction, blocking interaction with the main document (including trapping focus within the component while it is open), providing appropriate accessibility semantics, and determining the appropriate screen position.

Instead of providing the individual building blocks for all of these pieces, it may be better to bundle them together into high-level semantic elements. We already have <dialog>; we could imagine others such as <popup>, <toast>, <tooltip>, <sidebar>, etc. These would then bundle the appropriate features, e.g. while all of them would benefit from top layer interaction, <toast> and <tooltip> do not need to handle close signals. One such proposal in this area is the <popup> explainer.

Our current thinking is that we should produce both paths: we should work on bundled high-level APIs, such as the existing <dialog> and the upcoming <popup>, but we should also work on lower-level components, such as close signals or top layer management or focus trapping. And we should, as this document tries to do, ensure these build on top of each other. This gives authors more flexibility for creating their own novel components, without waiting to convince implementers of the value of baking their high-level component into the platform, while still providing an evolutionary path for evolving the built-in control set over time.

Extension: allowing confirmation before closing

A common developer request when we bring up these scenarios is to allow confirmation before closing a modal. For example, if filling out a form in a popup, they want to intercept any Esc keypresses or Android back button presses and ask "Are you sure you want to close this form and discard what you've entered?"

We've explored what this would look like in terms of extending the CloseWatcher API. The main difficulty is in preventing abuse by a malicious site. As such, we've punted on this for now, hoping to get a useful core API out first to solve (what we hope is) the 80% case. But see this document for details on what such an extension would look like.

We'd especially welcome feedback to confirm or disconfirm our intuition that the no-confirmation case is more prevalent. For example, if you are able to survey the modals in your application and look at how many of them would need confirm functionality versus how many would not, that would be great data to share on the issue tracker.

Security and privacy considerations

This feature has no security considerations to speak of, as a purely DOM API.

Regarding privacy, this proposal does not expose any information that was not already available via other means, such as keydown listeners.

See the W3C TAG Security and Privacy Questionnaire answers for more.

Stakeholder feedback

  • W3C TAG: w3ctag/design-review#594
  • Browser engines:
    • Chromium: Positive; prototyped behind a flag
    • Gecko: No feedback so far
    • WebKit: No feedback so far
  • Web developers: TODO

Acknowledgments

This proposal is based on an earlier analysis by @dvoytenko.

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools