res_pjsip_pubsub: Address SEGV when attempting to terminate a subscription
authorGeorge Joseph <gjoseph@digium.com>
Sun, 12 Jun 2016 16:19:27 +0000 (10:19 -0600)
committerGeorge Joseph <gjoseph@digium.com>
Tue, 21 Jun 2016 18:50:24 +0000 (13:50 -0500)
Occasionally under load we'll attempt to send a final NOTIFY on a
subscription that's already been terminated and a SEGV will occur
down in pjproject's evsub_destroy function.  This is a result of a
race condition between all the paths that can generate a notify
and/or destroy the underlying pjproject evsub object:

 * The client can send a SUBSCRIBE with Expires: 0.
 * The client can send a SUBSCRIBE/refresh.
 * The subscription timer can expire.
 * An extension state can change.
 * An MWI event can be generated.
 * The pjproject transaction timer (timer_b) can expire.

Normally when our pubsub_on_evsub_state is called with a terminate,
we push a task to the serializer and return at which point the dialog
is unlocked.  This is usually not a problem because the task runs
immediately and locks the dialog again.  When the system is heavily
loaded though, there may be a delay between the unlock and relock
during which another event may occur such as the subscription timer
or timer_b expiring, an extension state change, etc.  These may also
cause a terminate to be processed and if so, we could cause pjproject
to try to destroy the evsub structure twice.  There's no way for us to
tell that the evsub was already destroyed and the evsub's group lock
can't tolerate this and SEGVs.

The remedy is twofold.

 * A patch has been submitted to Teluu and added to the bundled
   pjproject which adds add/decrement operations on evsub's group lock.

 * In res_pjsip_pubsub:
   * configure.ac and pjproject-bundled's configure.m4 were updated
     to check for the new evsub group lock APIs.
   * We now add a reference to the evsub group lock when we create
     the subscription and remove the reference when we clean up the
     subscription.  This prevents evsub from being destroyed before
     we're done with it.
   * A state has been added to the subscription tree structure so
     termination progress can be tracked through the asyncronous tasks.
   * The pubsub_on_evsub_state callback has been split so it's not doing
     double duty.  It now only handles the final cleanup of the
     subscription tree.  pubsub_on_rx_refresh now handles both client
     refreshes and client terminates.  It was always being called for
     both anyway.
   * The serialized_on_server_timeout task was removed since
     serialized_pubsub_on_rx_refresh was almost identical.
   * Missing state checks and ao2_cleanups were added.
   * Some debug levels were adjusted to make seeing only off-nominal
     things at level 1 and nominal or progress things at level 2+.

ASTERISK-26099 #close
Reported-by: Ross Beer.

Change-Id: I779d11802cf672a51392e62a74a1216596075ba1

configure
configure.ac
include/asterisk/autoconfig.h.in
res/res_pjsip_pubsub.c
third-party/pjproject/configure.m4
third-party/pjproject/patches/0001-evsub-Add-APIs-to-add-decrement-an-event-subscriptio.patch [new file with mode: 0644]

index 04183b4..3aa6004 100755 (executable)
--- a/configure
+++ b/configure
@@ -923,6 +923,10 @@ PBX_POPT
 POPT_DIR
 POPT_INCLUDE
 POPT_LIB
+PBX_PJSIP_EVSUB_GRP_LOCK
+PJSIP_EVSUB_GRP_LOCK_DIR
+PJSIP_EVSUB_GRP_LOCK_INCLUDE
+PJSIP_EVSUB_GRP_LOCK_LIB
 PBX_PJSIP_TLS_TRANSPORT_PROTO
 PJSIP_TLS_TRANSPORT_PROTO_DIR
 PJSIP_TLS_TRANSPORT_PROTO_INCLUDE
@@ -10576,6 +10580,18 @@ PBX_PJSIP_TLS_TRANSPORT_PROTO=0
 
 
 
+PJSIP_EVSUB_GRP_LOCK_DESCRIP="PJSIP EVSUB Group Lock support"
+PJSIP_EVSUB_GRP_LOCK_OPTION=pjsip
+PJSIP_EVSUB_GRP_LOCK_DIR=${PJPROJECT_DIR}
+
+PBX_PJSIP_EVSUB_GRP_LOCK=0
+
+
+
+
+
+
+
 
     POPT_DESCRIP="popt"
     POPT_OPTION="popt"
@@ -13775,7 +13791,7 @@ else
     We can't simply define LARGE_OFF_T to be 9223372036854775807,
     since some C++ compilers masquerading as C compilers
     incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
                       && LARGE_OFF_T % 2147483647 == 1)
                      ? 1 : -1];
@@ -13821,7 +13837,7 @@ else
     We can't simply define LARGE_OFF_T to be 9223372036854775807,
     since some C++ compilers masquerading as C compilers
     incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
                       && LARGE_OFF_T % 2147483647 == 1)
                      ? 1 : -1];
@@ -13845,7 +13861,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
     We can't simply define LARGE_OFF_T to be 9223372036854775807,
     since some C++ compilers masquerading as C compilers
     incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
                       && LARGE_OFF_T % 2147483647 == 1)
                      ? 1 : -1];
@@ -13890,7 +13906,7 @@ else
     We can't simply define LARGE_OFF_T to be 9223372036854775807,
     since some C++ compilers masquerading as C compilers
     incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
                       && LARGE_OFF_T % 2147483647 == 1)
                      ? 1 : -1];
@@ -13914,7 +13930,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
     We can't simply define LARGE_OFF_T to be 9223372036854775807,
     since some C++ compilers masquerading as C compilers
     incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
                       && LARGE_OFF_T % 2147483647 == 1)
                      ? 1 : -1];
@@ -24740,6 +24756,9 @@ rm -f conftest*
 $as_echo "#define HAVE_PJSIP_TLS_TRANSPORT_PROTO 1" >>confdefs.h
 
 
+$as_echo "#define HAVE_PJSIP_EVSUB_GRP_LOCK 1" >>confdefs.h
+
+
    else
 
    if test "x${PBX_PJPROJECT}" != "x1" -a "${USE_PJPROJECT}" != "no"; then
@@ -25455,6 +25474,111 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 
       LIBS="${saved_libs}"
       CPPFLAGS="${saved_cppflags}"
+
+
+if test "x${PBX_PJSIP_EVSUB_GRP_LOCK}" != "x1" -a "${USE_PJSIP_EVSUB_GRP_LOCK}" != "no"; then
+   pbxlibdir=""
+   # if --with-PJSIP_EVSUB_GRP_LOCK=DIR has been specified, use it.
+   if test "x${PJSIP_EVSUB_GRP_LOCK_DIR}" != "x"; then
+      if test -d ${PJSIP_EVSUB_GRP_LOCK_DIR}/lib; then
+         pbxlibdir="-L${PJSIP_EVSUB_GRP_LOCK_DIR}/lib"
+      else
+         pbxlibdir="-L${PJSIP_EVSUB_GRP_LOCK_DIR}"
+      fi
+   fi
+   pbxfuncname="pjsip_evsub_add_lock"
+   if test "x${pbxfuncname}" = "x" ; then   # empty lib, assume only headers
+      AST_PJSIP_EVSUB_GRP_LOCK_FOUND=yes
+   else
+      ast_ext_lib_check_save_CFLAGS="${CFLAGS}"
+      CFLAGS="${CFLAGS} $PJPROJECT_CFLAGS"
+      as_ac_Lib=`$as_echo "ac_cv_lib_pjsip_${pbxfuncname}" | $as_tr_sh`
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ${pbxfuncname} in -lpjsip" >&5
+$as_echo_n "checking for ${pbxfuncname} in -lpjsip... " >&6; }
+if eval \${$as_ac_Lib+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lpjsip ${pbxlibdir} $PJPROJECT_LIB $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ${pbxfuncname} ();
+int
+main ()
+{
+return ${pbxfuncname} ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  eval "$as_ac_Lib=yes"
+else
+  eval "$as_ac_Lib=no"
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+eval ac_res=\$$as_ac_Lib
+              { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+if eval test \"x\$"$as_ac_Lib"\" = x"yes"; then :
+  AST_PJSIP_EVSUB_GRP_LOCK_FOUND=yes
+else
+  AST_PJSIP_EVSUB_GRP_LOCK_FOUND=no
+fi
+
+      CFLAGS="${ast_ext_lib_check_save_CFLAGS}"
+   fi
+
+   # now check for the header.
+   if test "${AST_PJSIP_EVSUB_GRP_LOCK_FOUND}" = "yes"; then
+      PJSIP_EVSUB_GRP_LOCK_LIB="${pbxlibdir} -lpjsip $PJPROJECT_LIB"
+      # if --with-PJSIP_EVSUB_GRP_LOCK=DIR has been specified, use it.
+      if test "x${PJSIP_EVSUB_GRP_LOCK_DIR}" != "x"; then
+         PJSIP_EVSUB_GRP_LOCK_INCLUDE="-I${PJSIP_EVSUB_GRP_LOCK_DIR}/include"
+      fi
+      PJSIP_EVSUB_GRP_LOCK_INCLUDE="${PJSIP_EVSUB_GRP_LOCK_INCLUDE} $PJPROJECT_CFLAGS"
+      if test "xpjsip.h" = "x" ; then  # no header, assume found
+         PJSIP_EVSUB_GRP_LOCK_HEADER_FOUND="1"
+      else                             # check for the header
+         ast_ext_lib_check_saved_CPPFLAGS="${CPPFLAGS}"
+         CPPFLAGS="${CPPFLAGS} ${PJSIP_EVSUB_GRP_LOCK_INCLUDE}"
+         ac_fn_c_check_header_mongrel "$LINENO" "pjsip.h" "ac_cv_header_pjsip_h" "$ac_includes_default"
+if test "x$ac_cv_header_pjsip_h" = xyes; then :
+  PJSIP_EVSUB_GRP_LOCK_HEADER_FOUND=1
+else
+  PJSIP_EVSUB_GRP_LOCK_HEADER_FOUND=0
+fi
+
+
+         CPPFLAGS="${ast_ext_lib_check_saved_CPPFLAGS}"
+      fi
+      if test "x${PJSIP_EVSUB_GRP_LOCK_HEADER_FOUND}" = "x0" ; then
+         PJSIP_EVSUB_GRP_LOCK_LIB=""
+         PJSIP_EVSUB_GRP_LOCK_INCLUDE=""
+      else
+         if test "x${pbxfuncname}" = "x" ; then                # only checking headers -> no library
+            PJSIP_EVSUB_GRP_LOCK_LIB=""
+         fi
+         PBX_PJSIP_EVSUB_GRP_LOCK=1
+         cat >>confdefs.h <<_ACEOF
+#define HAVE_PJSIP_EVSUB_GRP_LOCK 1
+_ACEOF
+
+      fi
+   fi
+fi
+
+
    fi
 fi
 
index 2c47dc4..72a47e2 100644 (file)
@@ -486,6 +486,7 @@ AST_EXT_LIB_SETUP_OPTIONAL([PJSIP_GET_DEST_INFO], [pjsip_get_dest_info support],
 AST_EXT_LIB_SETUP_OPTIONAL([PJ_SSL_CERT_LOAD_FROM_FILES2], [pj_ssl_cert_load_from_files2 support], [PJPROJECT], [pjsip])
 AST_EXT_LIB_SETUP_OPTIONAL([PJSIP_EXTERNAL_RESOLVER], [PJSIP External Resolver Support], [PJPROJECT], [pjsip])
 AST_EXT_LIB_SETUP_OPTIONAL([PJSIP_TLS_TRANSPORT_PROTO], [PJSIP TLS Transport proto field support], [PJPROJECT], [pjsip])
+AST_EXT_LIB_SETUP_OPTIONAL([PJSIP_EVSUB_GRP_LOCK], [PJSIP EVSUB Group Lock support], [PJPROJECT], [pjsip])
 
 AST_EXT_LIB_SETUP([POPT], [popt], [popt])
 AST_EXT_LIB_SETUP([PORTAUDIO], [PortAudio], [portaudio])
@@ -2209,6 +2210,8 @@ if test "$USE_PJPROJECT" != "no" ; then
       AST_C_COMPILE_CHECK([PJSIP_TLS_TRANSPORT_PROTO], [struct pjsip_tls_setting setting; int proto; proto = setting.proto;], [pjsip.h])
       LIBS="${saved_libs}"
       CPPFLAGS="${saved_cppflags}"
+
+      AST_EXT_LIB_CHECK([PJSIP_EVSUB_GRP_LOCK], [pjsip], [pjsip_evsub_add_lock], [pjsip.h], [$PJPROJECT_LIB], [$PJPROJECT_CFLAGS])
    fi
 fi
 
index 7224212..cd228d7 100644 (file)
 /* Define if your system has pjsip_dlg_create_uas_and_inc_lock declared. */
 #undef HAVE_PJSIP_DLG_CREATE_UAS_AND_INC_LOCK
 
+/* Define if your system has PJSIP_EVSUB_GRP_LOCK */
+#undef HAVE_PJSIP_EVSUB_GRP_LOCK
+
 /* Define if your system has pjsip_endpt_set_ext_resolver declared. */
 #undef HAVE_PJSIP_EXTERNAL_RESOLVER
 
index 06a1b52..4e41809 100644 (file)
@@ -378,6 +378,20 @@ struct subscription_persistence {
 };
 
 /*!
+ * \brief The state of the subscription tree
+ */
+enum sip_subscription_tree_state {
+       /*! Normal operation */
+       SIP_SUB_TREE_NORMAL = 0,
+       /*! A terminate has been requested by Asterisk, the client, or pjproject */
+       SIP_SUB_TREE_TERMINATE_PENDING,
+       /*! The terminate is in progress */
+       SIP_SUB_TREE_TERMINATE_IN_PROGRESS,
+       /*! The terminate process has finished and the subscription tree is no longer valid */
+       SIP_SUB_TREE_TERMINATED,
+};
+
+/*!
  * \brief A tree of SIP subscriptions
  *
  * Because of the ability to subscribe to resource lists, a SIP
@@ -411,8 +425,8 @@ struct sip_subscription_tree {
        int is_list;
        /*! Next item in the list */
        AST_LIST_ENTRY(sip_subscription_tree) next;
-       /*! Indicates that a NOTIFY is currently being sent on the SIP subscription */
-       int last_notify;
+       /*! Subscription tree state */
+       enum sip_subscription_tree_state state;
 };
 
 /*!
@@ -879,15 +893,15 @@ static void build_node_children(struct ast_sip_endpoint *endpoint, const struct
                                                        "allocation error afterwards\n", resource);
                                        continue;
                                }
-                               ast_debug(1, "Subscription to leaf resource %s resulted in success. Adding to parent %s\n",
+                               ast_debug(2, "Subscription to leaf resource %s resulted in success. Adding to parent %s\n",
                                                resource, parent->resource);
                                AST_VECTOR_APPEND(&parent->children, current);
                        } else {
-                               ast_debug(1, "Subscription to leaf resource %s resulted in error response %d\n",
+                               ast_debug(2, "Subscription to leaf resource %s resulted in error response %d\n",
                                                resource, resp);
                        }
                } else {
-                       ast_debug(1, "Resource %s (child of %s) is a list\n", resource, parent->resource);
+                       ast_debug(2, "Resource %s (child of %s) is a list\n", resource, parent->resource);
                        current = tree_node_alloc(resource, visited, child_list->full_state);
                        if (!current) {
                                ast_debug(1, "Cannot build children of resource %s due to allocation failure\n", resource);
@@ -898,7 +912,7 @@ static void build_node_children(struct ast_sip_endpoint *endpoint, const struct
                                ast_debug(1, "List %s had no successful children.\n", resource);
                                AST_VECTOR_APPEND(&parent->children, current);
                        } else {
-                               ast_debug(1, "List %s had successful children. Adding to parent %s\n",
+                               ast_debug(2, "List %s had successful children. Adding to parent %s\n",
                                                resource, parent->resource);
                                tree_node_destroy(current);
                        }
@@ -970,7 +984,7 @@ static int build_resource_tree(struct ast_sip_endpoint *endpoint, const struct a
        struct resources visited;
 
        if (!has_eventlist_support || !(list = retrieve_resource_list(resource, handler->event_name))) {
-               ast_debug(1, "Subscription to resource %s is not to a list\n", resource);
+               ast_debug(2, "Subscription to resource %s is not to a list\n", resource);
                tree->root = tree_node_alloc(resource, NULL, 0);
                if (!tree->root) {
                        return 500;
@@ -978,7 +992,7 @@ static int build_resource_tree(struct ast_sip_endpoint *endpoint, const struct a
                return handler->notifier->new_subscribe(endpoint, resource);
        }
 
-       ast_debug(1, "Subscription to resource %s is a list\n", resource);
+       ast_debug(2, "Subscription to resource %s is a list\n", resource);
        if (AST_VECTOR_INIT(&visited, AST_VECTOR_SIZE(&list->items))) {
                return 500;
        }
@@ -1015,7 +1029,7 @@ static void remove_subscription(struct sip_subscription_tree *obj)
                if (i == obj) {
                        AST_RWLIST_REMOVE_CURRENT(next);
                        if (i->root) {
-                               ast_debug(1, "Removing subscription to resource %s from list of subscriptions\n",
+                               ast_debug(2, "Removing subscription to resource %s from list of subscriptions\n",
                                                ast_sip_subscription_get_resource_name(i->root));
                        }
                        break;
@@ -1307,6 +1321,10 @@ static struct sip_subscription_tree *create_subscription_tree(const struct ast_s
        pjsip_evsub_create_uas(dlg, &pubsub_cb, rdata, 0, &sub_tree->evsub);
        subscription_setup_dialog(sub_tree, dlg);
 
+#ifdef HAVE_PJSIP_EVSUB_GRP_LOCK
+       pjsip_evsub_add_ref(sub_tree->evsub);
+#endif
+
        ast_sip_mod_data_set(dlg->pool, dlg->mod_data, pubsub_module.id, MOD_DATA_MSG,
                        pjsip_msg_clone(dlg->pool, rdata->msg_info.msg));
 
@@ -2230,10 +2248,8 @@ static int send_notify(struct sip_subscription_tree *sub_tree, unsigned int forc
                pjsip_msg_add_hdr(tdata->msg, (pjsip_hdr *) require);
        }
 
-       if (sub_tree->root->subscription_state == PJSIP_EVSUB_STATE_TERMINATED) {
-               sub_tree->last_notify = 1;
-       }
        if (sip_subscription_send_request(sub_tree, tdata)) {
+               pjsip_tx_data_dec_ref(tdata);
                return -1;
        }
 
@@ -2248,21 +2264,32 @@ static int serialized_send_notify(void *userdata)
        pjsip_dialog *dlg = sub_tree->dlg;
 
        pjsip_dlg_inc_lock(dlg);
+
        /* It's possible that between when the notification was scheduled
-        * and now, that a new SUBSCRIBE arrived, requiring full state to be
-        * sent out in an immediate NOTIFY. If that has happened, we need to
+        * and now a new SUBSCRIBE arrived requiring full state to be
+        * sent out in an immediate NOTIFY. It's also possible that we're
+        * already processing a terminate.  If that has happened, we need to
         * bail out here instead of sending the batched NOTIFY.
         */
-       if (!sub_tree->send_scheduled_notify) {
+
+       if (sub_tree->state >= SIP_SUB_TREE_TERMINATE_IN_PROGRESS
+               || !sub_tree->send_scheduled_notify) {
                pjsip_dlg_dec_lock(dlg);
                ao2_cleanup(sub_tree);
                return 0;
        }
 
+       if (sub_tree->root->subscription_state == PJSIP_EVSUB_STATE_TERMINATED) {
+               sub_tree->state = SIP_SUB_TREE_TERMINATE_IN_PROGRESS;
+       }
+
        send_notify(sub_tree, 0);
-       ast_test_suite_event_notify("SUBSCRIPTION_STATE_CHANGED",
-                       "Resource: %s",
-                       sub_tree->root->resource);
+
+       ast_test_suite_event_notify(
+               sub_tree->state == SIP_SUB_TREE_TERMINATED
+                       ? "SUBSCRIPTION_TERMINATED" : "SUBSCRIPTION_STATE_CHANGED",
+               "Resource: %s", sub_tree->root->resource);
+
        sub_tree->notify_sched_id = -1;
        pjsip_dlg_dec_lock(dlg);
        ao2_cleanup(sub_tree);
@@ -2274,7 +2301,10 @@ static int sched_cb(const void *data)
        struct sip_subscription_tree *sub_tree = (struct sip_subscription_tree *) data;
 
        /* We don't need to bump the refcount of sub_tree since we bumped it when scheduling this task */
-       ast_sip_push_task(sub_tree->serializer, serialized_send_notify, sub_tree);
+       if (ast_sip_push_task(sub_tree->serializer, serialized_send_notify, sub_tree)) {
+               ao2_cleanup(sub_tree);
+       }
+
        return 0;
 }
 
@@ -2285,12 +2315,13 @@ static int schedule_notification(struct sip_subscription_tree *sub_tree)
                return 0;
        }
 
+       sub_tree->send_scheduled_notify = 1;
        sub_tree->notify_sched_id = ast_sched_add(sched, sub_tree->notification_batch_interval, sched_cb, ao2_bump(sub_tree));
        if (sub_tree->notify_sched_id < 0) {
+               ao2_cleanup(sub_tree);
                return -1;
        }
 
-       sub_tree->send_scheduled_notify = 1;
        return 0;
 }
 
@@ -2302,7 +2333,7 @@ int ast_sip_subscription_notify(struct ast_sip_subscription *sub, struct ast_sip
 
        pjsip_dlg_inc_lock(dlg);
 
-       if (!sub->tree->evsub) {
+       if (sub->tree->state != SIP_SUB_TREE_NORMAL) {
                pjsip_dlg_dec_lock(dlg);
                return 0;
        }
@@ -2316,6 +2347,7 @@ int ast_sip_subscription_notify(struct ast_sip_subscription *sub, struct ast_sip
        sub->body_changed = 1;
        if (terminate) {
                sub->subscription_state = PJSIP_EVSUB_STATE_TERMINATED;
+               sub->tree->state = SIP_SUB_TREE_TERMINATE_PENDING;
        }
 
        if (sub->tree->notification_batch_interval) {
@@ -2323,6 +2355,9 @@ int ast_sip_subscription_notify(struct ast_sip_subscription *sub, struct ast_sip
        } else {
                /* See the note in pubsub_on_rx_refresh() for why sub->tree is refbumped here */
                ao2_ref(sub->tree, +1);
+               if (terminate) {
+                       sub->tree->state = SIP_SUB_TREE_TERMINATE_IN_PROGRESS;
+               }
                res = send_notify(sub->tree, 0);
                ast_test_suite_event_notify(terminate ? "SUBSCRIPTION_TERMINATED" : "SUBSCRIPTION_STATE_CHANGED",
                                "Resource: %s",
@@ -3268,71 +3303,72 @@ static void set_state_terminated(struct ast_sip_subscription *sub)
        }
 }
 
-/* XXX This function and serialized_pubsub_on_rx_refresh are nearly identical */
-static int serialized_pubsub_on_server_timeout(void *userdata)
-{
-       struct sip_subscription_tree *sub_tree = userdata;
-       pjsip_dialog *dlg = sub_tree->dlg;
-
-       pjsip_dlg_inc_lock(dlg);
-       if (!sub_tree->evsub) {
-               pjsip_dlg_dec_lock(dlg);
-               return 0;
-       }
-       set_state_terminated(sub_tree->root);
-       send_notify(sub_tree, 1);
-       ast_test_suite_event_notify("SUBSCRIPTION_TERMINATED",
-                       "Resource: %s",
-                       sub_tree->root->resource);
-
-       pjsip_dlg_dec_lock(dlg);
-       ao2_cleanup(sub_tree);
-       return 0;
-}
-
 /*!
- * \brief PJSIP callback when underlying SIP subscription changes state
- *
- * This callback is a bit of a mess, because it's not always called when
- * you might expect it to be, and it can be called multiple times for the
- * same state.
+ * \brief Callback sequence for subscription terminate:
  *
- * For instance, this function is not called at all when an incoming SUBSCRIBE
- * arrives to refresh a subscription. That makes sense in a way, since the
- * subscription state has not made a change; it was active and remains active.
+ * * Client initiated:
+ *     pjproject receives SUBSCRIBE on the subscription's serializer thread
+ *         calls pubsub_on_rx_refresh with dialog locked
+ *             pubsub_on_rx_refresh sets TERMINATE_PENDING
+ *             pushes serialized_pubsub_on_refresh_timeout
+ *             returns to pjproject
+ *         pjproject calls pubsub_on_evsub_state
+ *             pubsub_evsub_set_state checks state == TERMINATE_IN_PROGRESS (no)
+ *             ignore and return
+ *         pjproject unlocks dialog
+ *     serialized_pubsub_on_refresh_timeout starts (1)
+ *       locks dialog
+ *       checks state == TERMINATE_PENDING
+ *       sets TERMINATE_IN_PROGRESS
+ *       calls send_notify (2)
+ *           send_notify ultimately calls pjsip_evsub_send_request
+ *               pjsip_evsub_send_request calls evsub's set_state
+ *                   set_state calls pubsub_evsub_set_state
+ *                       pubsub_evsub_set_state checks state == TERMINATE_IN_PROGRESS
+ *                       removes the subscriptions
+ *                       cleans up references to evsub
+ *                       sets state = TERMINATED
+ *       serialized_pubsub_on_refresh_timeout unlocks dialog
  *
- * However, if an incoming SUBSCRIBE arrives to end a subscription, then this
- * will be called into once upon receiving the SUBSCRIBE (after the call to
- * pubsub_on_rx_refresh) and again when sending a NOTIFY to end the subscription.
- * In both cases, the apparent state of the subscription is "terminated".
+ * * Subscription timer expires:
+ *     pjproject timer expires
+ *         locks dialog
+ *         calls pubsub_on_server_timeout
+ *             pubsub_on_server_timeout checks state == NORMAL
+ *             sets TERMINATE_PENDING
+ *             pushes serialized_pubsub_on_refresh_timeout
+ *             returns to pjproject
+ *         pjproject unlocks dialog
+ *     serialized_pubsub_on_refresh_timeout starts
+ *         See (1) Above
  *
- * However, the double-terminated state changes don't happen in all cases. For
- * instance, if a subscription expires, then the only time this callback is
- * called is when we send the NOTIFY to end the subscription.
  *
- * As far as state changes are concerned, we only ever care about transitions
- * to the "terminated" state. The action we take here is dependent on the
- * conditions behind why the state change to "terminated" occurred. If the
- * state change has occurred because we are sending a NOTIFY to end the
- * subscription, we consider this to be the final hurrah of the subscription
- * and take measures to start shutting things down. If the state change to
- * terminated occurs for a different reason (e.g. transaction timeout,
- * incoming SUBSCRIBE to end the subscription), then we push a task to
- * send out a NOTIFY. When that NOTIFY is sent, this callback will be
- * called again and we will actually shut down the subscription. The
- * subscription tree's last_notify field let's us know if this is being
- * called as a result of a terminating NOTIFY or not.
+ * * ast_sip_subscription_notify is called
+ *       checks state == NORMAL
+ *       if not batched...
+ *           sets TERMINATE_IN_PROGRESS (if terminate is requested)
+ *           calls send_notify
+ *               See (2) Above
+ *       if batched...
+ *           sets TERMINATE_PENDING
+ *           schedules task
+ *       scheduler runs sched_task
+ *           sched_task pushes serialized_send_notify
+ *       serialized_send_notify starts
+ *           checks state <= TERMINATE_PENDING
+ *           if state == TERMINATE_PENDING set state = TERMINATE_IN_PROGRESS
+ *           call send_notify
+ *               See (2) Above
  *
- * There is no guarantee that this function will be called from a serializer
- * thread since it can be called due to a transaction timeout. Therefore
- * synchronization primitives are necessary to ensure that no operations
- * step on each others' toes. The dialog lock is always held when this
- * callback is called, so we ensure that relevant structures that may
- * be touched in this function are always protected by the dialog lock
- * elsewhere as well. The dialog lock in particular protects
+ */
+
+/*!
+ * \brief PJSIP callback when underlying SIP subscription changes state
  *
- * \li The subscription tree's last_notify field
- * \li The subscription tree's evsub pointer
+ * Although this function is called for every state change, we only care
+ * about the TERMINATED state, and only when we're actually processing the final
+ * notify (SIP_SUB_TREE_TERMINATE_IN_PROGRESS).  In this case, we do all
+ * the subscription tree cleanup tasks and decrement the evsub reference.
  */
 static void pubsub_on_evsub_state(pjsip_evsub *evsub, pjsip_event *event)
 {
@@ -3345,51 +3381,55 @@ static void pubsub_on_evsub_state(pjsip_evsub *evsub, pjsip_event *event)
        }
 
        sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
-       if (!sub_tree) {
+       if (!sub_tree || sub_tree->state != SIP_SUB_TREE_TERMINATE_IN_PROGRESS) {
+               ast_debug(1, "Possible terminate race prevented %p\n", sub_tree);
                return;
        }
 
-       if (!sub_tree->last_notify) {
-               if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_server_timeout, ao2_bump(sub_tree))) {
-                       ast_log(LOG_ERROR, "Failed to push task to send final NOTIFY.\n");
-                       ao2_ref(sub_tree, -1);
-               } else {
-                       return;
-               }
-       }
-
        remove_subscription(sub_tree);
+
        pjsip_evsub_set_mod_data(evsub, pubsub_module.id, NULL);
+
+#ifdef HAVE_PJSIP_EVSUB_GRP_LOCK
+       pjsip_evsub_dec_ref(sub_tree->evsub);
+#endif
+
        sub_tree->evsub = NULL;
+
        ast_sip_dialog_set_serializer(sub_tree->dlg, NULL);
        ast_sip_dialog_set_endpoint(sub_tree->dlg, NULL);
+
        subscription_persistence_remove(sub_tree);
        shutdown_subscriptions(sub_tree->root);
 
+       sub_tree->state = SIP_SUB_TREE_TERMINATED;
        /* Remove evsub's reference to the sub_tree */
        ao2_ref(sub_tree, -1);
 }
 
-static int serialized_pubsub_on_rx_refresh(void *userdata)
+static int serialized_pubsub_on_refresh_timeout(void *userdata)
 {
        struct sip_subscription_tree *sub_tree = userdata;
        pjsip_dialog *dlg = sub_tree->dlg;
 
        pjsip_dlg_inc_lock(dlg);
-       if (!sub_tree->evsub) {
+       if (sub_tree->state >= SIP_SUB_TREE_TERMINATE_IN_PROGRESS) {
+               ast_debug(1, "Possible terminate race prevented %p %d\n", sub_tree->evsub, sub_tree->state);
                pjsip_dlg_dec_lock(dlg);
+               ao2_cleanup(sub_tree);
                return 0;
        }
 
-       if (pjsip_evsub_get_state(sub_tree->evsub) == PJSIP_EVSUB_STATE_TERMINATED) {
+       if (sub_tree->state == SIP_SUB_TREE_TERMINATE_PENDING) {
+               sub_tree->state = SIP_SUB_TREE_TERMINATE_IN_PROGRESS;
                set_state_terminated(sub_tree->root);
        }
 
        send_notify(sub_tree, 1);
 
        ast_test_suite_event_notify(sub_tree->root->subscription_state == PJSIP_EVSUB_STATE_TERMINATED ?
-                       "SUBSCRIPTION_TERMINATED" : "SUBSCRIPTION_REFRESHED",
-                       "Resource: %s", sub_tree->root->resource);
+                               "SUBSCRIPTION_TERMINATED" : "SUBSCRIPTION_REFRESHED",
+                               "Resource: %s", sub_tree->root->resource);
 
        pjsip_dlg_dec_lock(dlg);
        ao2_cleanup(sub_tree);
@@ -3402,10 +3442,8 @@ static int serialized_pubsub_on_rx_refresh(void *userdata)
  * This includes both SUBSCRIBE requests that actually refresh the subscription
  * as well as SUBSCRIBE requests that end the subscription.
  *
- * In the case where the SUBSCRIBE is actually refreshing the subscription we
- * push a task to send an appropriate NOTIFY request. In the case where the
- * SUBSCRIBE is ending the subscription, we let the pubsub_on_evsub_state
- * callback take care of sending the terminal NOTIFY request instead.
+ * In either case we push serialized_pubsub_on_refresh_timeout to send an
+ * appropriate NOTIFY request.
  */
 static void pubsub_on_rx_refresh(pjsip_evsub *evsub, pjsip_rx_data *rdata,
                int *p_st_code, pj_str_t **p_st_text, pjsip_hdr *res_hdr, pjsip_msg_body **p_body)
@@ -3413,18 +3451,24 @@ static void pubsub_on_rx_refresh(pjsip_evsub *evsub, pjsip_rx_data *rdata,
        struct sip_subscription_tree *sub_tree;
 
        sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
-       if (!sub_tree) {
+       if (!sub_tree || sub_tree->state != SIP_SUB_TREE_NORMAL) {
+               ast_debug(1, "Possible terminate race prevented %p %d\n", sub_tree, sub_tree ? sub_tree->state : -1 );
                return;
        }
 
        /* PJSIP will set the evsub's state to terminated before calling into this function
         * if the Expires value of the incoming SUBSCRIBE is 0.
         */
-       if (pjsip_evsub_get_state(sub_tree->evsub) != PJSIP_EVSUB_STATE_TERMINATED) {
-               if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_rx_refresh, ao2_bump(sub_tree))) {
-                       /* If we can't push the NOTIFY refreshing task...we'll just go with it. */
-                       ao2_ref(sub_tree, -1);
-               }
+
+       if (pjsip_evsub_get_state(sub_tree->evsub) == PJSIP_EVSUB_STATE_TERMINATED) {
+               sub_tree->state = SIP_SUB_TREE_TERMINATE_PENDING;
+       }
+
+       if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_refresh_timeout, ao2_bump(sub_tree))) {
+               /* If we can't push the NOTIFY refreshing task...we'll just go with it. */
+               ast_log(LOG_ERROR, "Failed to push task to send NOTIFY.\n");
+               sub_tree->state = SIP_SUB_TREE_NORMAL;
+               ao2_ref(sub_tree, -1);
        }
 
        if (sub_tree->is_list) {
@@ -3435,9 +3479,9 @@ static void pubsub_on_rx_refresh(pjsip_evsub *evsub, pjsip_rx_data *rdata,
 static void pubsub_on_rx_notify(pjsip_evsub *evsub, pjsip_rx_data *rdata, int *p_st_code,
                pj_str_t **p_st_text, pjsip_hdr *res_hdr, pjsip_msg_body **p_body)
 {
-       struct ast_sip_subscription *sub = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
+       struct ast_sip_subscription *sub;
 
-       if (!sub) {
+       if (!(sub = pjsip_evsub_get_mod_data(evsub, pubsub_module.id))) {
                return;
        }
 
@@ -3450,45 +3494,62 @@ static int serialized_pubsub_on_client_refresh(void *userdata)
        struct sip_subscription_tree *sub_tree = userdata;
        pjsip_tx_data *tdata;
 
+       if (!sub_tree->evsub) {
+               ao2_cleanup(sub_tree);
+               return 0;
+       }
+
        if (pjsip_evsub_initiate(sub_tree->evsub, NULL, -1, &tdata) == PJ_SUCCESS) {
                pjsip_evsub_send_request(sub_tree->evsub, tdata);
        } else {
                pjsip_evsub_terminate(sub_tree->evsub, PJ_TRUE);
        }
+
        ao2_cleanup(sub_tree);
        return 0;
 }
 
 static void pubsub_on_client_refresh(pjsip_evsub *evsub)
 {
-       struct sip_subscription_tree *sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
+       struct sip_subscription_tree *sub_tree;
+
+       if (!(sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id))) {
+               return;
+       }
 
-       ao2_ref(sub_tree, +1);
-       ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_client_refresh, sub_tree);
+       if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_client_refresh, ao2_bump(sub_tree))) {
+               ao2_cleanup(sub_tree);
+       }
 }
 
 static void pubsub_on_server_timeout(pjsip_evsub *evsub)
 {
+       struct sip_subscription_tree *sub_tree;
 
-       struct sip_subscription_tree *sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
-       if (!sub_tree) {
-               /* PJSIP does not terminate the server timeout timer when a SUBSCRIBE
-                * with Expires: 0 arrives to end a subscription, nor does it terminate
-                * this timer when we send a NOTIFY request in response to receiving such
-                * a SUBSCRIBE. PJSIP does not stop the server timeout timer until the
-                * NOTIFY transaction has finished (either through receiving a response
-                * or through a transaction timeout).
-                *
-                * Therefore, it is possible that we can be told that a server timeout
-                * occurred after we already thought that the subscription had been
-                * terminated. In such a case, we will have already removed the sub_tree
-                * from the evsub's mod_data array.
-                */
+       /* PJSIP does not terminate the server timeout timer when a SUBSCRIBE
+        * with Expires: 0 arrives to end a subscription, nor does it terminate
+        * this timer when we send a NOTIFY request in response to receiving such
+        * a SUBSCRIBE. PJSIP does not stop the server timeout timer until the
+        * NOTIFY transaction has finished (either through receiving a response
+        * or through a transaction timeout).
+        *
+        * Therefore, it is possible that we can be told that a server timeout
+        * occurred after we already thought that the subscription had been
+        * terminated. In such a case, we will have already removed the sub_tree
+        * from the evsub's mod_data array.
+        */
+
+       sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
+       if (!sub_tree || sub_tree->state != SIP_SUB_TREE_NORMAL) {
+               ast_debug(1, "Possible terminate race prevented %p %d\n", sub_tree, sub_tree ? sub_tree->state : -1 );
         return;
        }
 
-       ao2_ref(sub_tree, +1);
-       ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_server_timeout, sub_tree);
+       sub_tree->state = SIP_SUB_TREE_TERMINATE_PENDING;
+       if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_refresh_timeout, ao2_bump(sub_tree))) {
+               sub_tree->state = SIP_SUB_TREE_NORMAL;
+               ao2_cleanup(sub_tree);
+       }
 }
 
 static int ami_subscription_detail(struct sip_subscription_tree *sub_tree,
index 2cc18bf..67ac04d 100644 (file)
@@ -44,4 +44,5 @@ AC_DEFUN([PJPROJECT_CONFIGURE],
        PJPROJECT_SYMBOL_CHECK([PJ_SSL_CERT_LOAD_FROM_FILES2], [pj_ssl_cert_load_from_files2], [pjlib.h])
        PJPROJECT_SYMBOL_CHECK([PJSIP_EXTERNAL_RESOLVER], [pjsip_endpt_set_ext_resolver], [pjsip.h])
        AC_DEFINE([HAVE_PJSIP_TLS_TRANSPORT_PROTO], 1, [Define if your system has PJSIP_TLS_TRANSPORT_PROTO])
+       AC_DEFINE([HAVE_PJSIP_EVSUB_GRP_LOCK], 1, [Define if your system has PJSIP_EVSUB_GRP_LOCK])
 ])
diff --git a/third-party/pjproject/patches/0001-evsub-Add-APIs-to-add-decrement-an-event-subscriptio.patch b/third-party/pjproject/patches/0001-evsub-Add-APIs-to-add-decrement-an-event-subscriptio.patch
new file mode 100644 (file)
index 0000000..d2a47c6
--- /dev/null
@@ -0,0 +1,73 @@
+From a5030c9b33b2c936879fbacb1d2ea5edc2979181 Mon Sep 17 00:00:00 2001
+From: George Joseph <gjoseph@digium.com>
+Date: Sat, 18 Jun 2016 10:14:34 -0600
+Subject: [PATCH] evsub:  Add APIs to add/decrement an event  subscription's
+ group lock
+
+These APIs can be used to ensure that the evsub isn't destroyed before
+an application is finished using it.
+---
+ pjsip/include/pjsip-simple/evsub.h | 20 ++++++++++++++++++++
+ pjsip/src/pjsip-simple/evsub.c     | 14 ++++++++++++++
+ 2 files changed, 34 insertions(+)
+
+diff --git a/pjsip/include/pjsip-simple/evsub.h b/pjsip/include/pjsip-simple/evsub.h
+index 2dc4d69..31f85f8 100644
+--- a/pjsip/include/pjsip-simple/evsub.h
++++ b/pjsip/include/pjsip-simple/evsub.h
+@@ -490,6 +490,26 @@ PJ_DECL(void) pjsip_evsub_set_mod_data( pjsip_evsub *sub, unsigned mod_id,
+ PJ_DECL(void*) pjsip_evsub_get_mod_data( pjsip_evsub *sub, unsigned mod_id );
++/**
++ * Increment the event subscription's group lock.
++ *
++ * @param sub         The server subscription instance.
++ *
++ * @return            PJ_SUCCESS on success.
++ */
++PJ_DEF(pj_status_t) pjsip_evsub_add_ref(pjsip_evsub *sub);
++
++
++/**
++ * Decrement the event subscription's group lock.
++ *
++ * @param sub         The server subscription instance.
++ *
++ * @return            PJ_SUCCESS on success.
++ */
++PJ_DEF(pj_status_t) pjsip_evsub_dec_ref(pjsip_evsub *sub);
++
++
+ PJ_END_DECL
+diff --git a/pjsip/src/pjsip-simple/evsub.c b/pjsip/src/pjsip-simple/evsub.c
+index 7cd8859..68a9564 100644
+--- a/pjsip/src/pjsip-simple/evsub.c
++++ b/pjsip/src/pjsip-simple/evsub.c
+@@ -831,7 +831,21 @@ static pj_status_t evsub_create( pjsip_dialog *dlg,
+     return PJ_SUCCESS;
+ }
++/*
++ * Increment the event subscription's group lock.
++ */
++PJ_DEF(pj_status_t) pjsip_evsub_add_ref(pjsip_evsub *sub)
++{
++    return pj_grp_lock_add_ref(sub->grp_lock);
++}
++/*
++ * Decrement the event subscription's group lock.
++ */
++PJ_DEF(pj_status_t) pjsip_evsub_dec_ref(pjsip_evsub *sub)
++{
++    return pj_grp_lock_dec_ref(sub->grp_lock);
++}
+ /*
+  * Create client subscription session.
+-- 
+2.5.5
+