Class PubSubSubscriptionMaintenance

java.lang.Object
org.jivesoftware.openfire.pubsub.PubSubSubscriptionMaintenance

public class PubSubSubscriptionMaintenance extends Object
Analyzes and (optionally) cleans up redundant rows in the ofPubsubSubscription table. Some installations have accumulated very large numbers of redundant subscription rows: rows that share the same node, subscription JID, owner and subscription type, differing only by their generated subscription ID. On a node that does not allow multiple subscriptions for the same subscription JID (XEP-0060 §6.1.6) - most notably PEP services (XEP-0163) - at most one such subscription can be meaningful, so the surplus rows carry no functional value. In extreme cases their sheer number exhausts the Java heap when the data is loaded into memory (OF-3306). This utility is intended to be driven from an admin-console page. It offers three operations:
  • analyze() - a read-only assessment of how much redundant data exists (safe to call on page load).
  • startCleanup() - launches a batched, background deletion of the redundant rows.
  • getProgress() - a thread-safe snapshot of an in-progress or completed cleanup, for a progress bar.

What is deleted

For each group of rows that share (serviceID, nodeID, jid, owner, subscriptionType), exactly one row is kept (the one with the lexicographically greatest id); all others in the group are removed. Groups with only a single row are never touched. The deletion is performed in bounded batches, each in its own transaction, so that it can run against a live server without producing an unmanageably large transaction.

Safety with respect to multiple-subscription services

Same-key rows are only redundant on a service that does not allow multiple subscriptions for the same subscription JID (XEP-0060 §6.1.6). On a service that does allow them, such rows are legitimate and are differentiated by their subscription ID; deleting them would destroy live subscriptions. Whether multiple subscriptions are allowed is a service-wide setting (PubSubService.isMultipleSubscriptionsEnabled()): PEP services always return false, while the main pubsub service is governed by the xmpp.pubsub.multiple-subscriptions property. Because this deletion runs at the database level, it cannot itself consult that in-memory, per-service setting. Instead the caller - which runs inside the server and can enumerate the live services - must supply the set of service IDs that permit multiple subscriptions via the constructor. Those services are excluded from both the analysis and the deletion, so their rows are never counted as removable and never deleted. Inverting the dependency this way keeps the authority for the safety decision with the code that can actually answer the question, rather than having this utility guess. This utility performs no deletion until startCleanup() is explicitly invoked, and administrators should be advised to take a database backup first. Instances are not designed to run concurrent cleanups; startCleanup() guards against launching a second cleanup while one is already running.
  • Constructor Details

    • PubSubSubscriptionMaintenance

      public PubSubSubscriptionMaintenance(@Nonnull Collection<String> multipleSubscriptionServiceIds)
      Creates a maintenance utility that excludes the supplied multiple-subscription services from analysis and cleanup.
      Parameters:
      multipleSubscriptionServiceIds - the IDs of services for which isMultipleSubscriptionsEnabled() is true. Rows belonging to these services are never counted as removable and never deleted. Must not be null; pass an empty collection only when the deployment is known to have no service that permits multiple subscriptions. In practice this set is very small (often just the single main pubsub service); it is rendered into a SQL IN list, so it is not intended to hold thousands of entries.
  • Method Details

    • isCleanupAdvisable

      public static boolean isCleanupAdvisable()
      Returns whether a cleanup is worth recommending, for an advisory on the admin index page. Non-blocking: returns the cached value immediately and, if that value is missing or stale (and no run is in progress), schedules a one-off background refresh. A full analyze() can take many seconds on a very large table, so it must never run on the page-rendering thread; consequently the first index view after startup returns false and the advisory may only appear on a later view, once the background check has completed.
      Returns:
      the cached advisability flag; false until the first background check has completed.
    • setCleanupAdvisable

      public static void setCleanupAdvisable(boolean advisable)
      Directly sets the cached advisability value and marks it freshly checked. Used by the cleanup worker on completion, when the outcome is already known, to avoid a redundant re-analysis.
      Parameters:
      advisable - whether a cleanup is now worth recommending.
    • initialize

      public static PubSubSubscriptionMaintenance initialize(@Nonnull Collection<String> multipleSubscriptionServiceIds)
      Initializes the shared maintenance instance with the set of services that permit multiple subscriptions, if it has not already been created. Called by the pubsub module at startup, when the live services can be inspected.
      Parameters:
      multipleSubscriptionServiceIds - service IDs to exclude from analysis and cleanup; see the constructor.
      Returns:
      the shared instance.
    • getInstance

      @Nullable public static PubSubSubscriptionMaintenance getInstance()
      Returns:
      the shared maintenance instance, or null if it has not been initialized yet (the pubsub service has not started). Callers that need an instance before startup should treat null as "not yet available".
    • getExcludedServiceIds

      @Nonnull public Set<String> getExcludedServiceIds()
      Returns:
      the service IDs excluded from analysis and cleanup (those permitting multiple subscriptions). Never null.
    • analyze

      @Nonnull public PubSubSubscriptionMaintenance.Analysis analyze() throws SQLException
      Performs a read-only assessment of the redundant-row situation. This issues a single aggregate query. On a very large table it can take some seconds (a full scan), but it neither locks rows for writing nor modifies any data, so it is safe to call when rendering an admin page.
      Returns:
      the analysis result, never null.
      Throws:
      SQLException - if the database could not be queried.
    • startCleanup

      public boolean startCleanup()
      Launches a cleanup on a background thread, unless one is already running. The cleanup deletes redundant rows in batches (see DELETE_BATCH_SIZE), committing after each batch and updating getProgress() as it goes. Control returns to the caller immediately; the admin page should poll getProgress() to render a progress indicator.
      Returns:
      true if a new cleanup was started; false if one was already running.
    • getProgress

      @Nonnull public PubSubSubscriptionMaintenance.Progress getProgress()
      Returns:
      a snapshot of the current (or most recently completed) cleanup progress. Never null.