implement pthread_workqueue within libdispatch #206

dgrove-oss · 2017-01-26T13:55:22Z

Provide an implementation of the pthread_workqueue
functionality as a fully integrated component of
libdispatch. Integration of the workqueue implementation
into the dispatch code base simplifies the process of
evolving the APIs between the two layers and thus
prepares for future optimization and enhancements.

Initially, the integrated pthread_workqueue is only
enabled by default on Linux. Most of the internals are
built on pthread APIs and thus should be fairly portable.
However, Linux-specific code is used to dynamically estimate
the number of runnable worker threads by reading /proc.
Porting the thread pool management code to non-Linux
platforms would entail providing similar functionality
for those platforms (or otherwise restructuring the manager).

MadCoder

This is only a first pass, and I haven't tried to grok the algorithms yet.

My biggest remark is with naming short term, FWIW I would have namespaced evrything in that module _dispatch_wq_* which would yield function names such as _dispatch_wq_add_workqueue instead of _dispatch_workqueue_add_wq.

Second, a workqueue is not the pool, it is the thread that is used to run queues. I think the pools should be called thread pool (dispatch_wq_pool is fine since as said a work queue is a thread, so no need to qualify as dispatch_wq_thread_pool), or somesuch to make the code easier to read for newcomers.

If you don't like dispatch_wq_ you can use dispatch_workq_ which isn't that much longer and would pair nicely with the namespace for many of the pre-existing WORKQ_* flags of the interface.

I'll try to have a more thoughtful look at the various algorithms, atomics barriers and the like later, but that all looks very promising!

MadCoder · 2017-01-26T17:07:18Z

src/queue.c

@@ -159,7 +160,7 @@ struct dispatch_root_queue_context_s {
 #if HAVE_PTHREAD_WORKQUEUES
 			qos_class_t dgq_qos;
 			int dgq_wq_priority, dgq_wq_options;
-#if DISPATCH_USE_LEGACY_WORKQUEUE_FALLBACK || DISPATCH_USE_PTHREAD_POOL
+#if DISPATCH_USE_LEGACY_WORKQUEUE_FALLBACK || DISPATCH_USE_PTHREAD_POOL || HAVE_INTERNAL_PTHREAD_WORKQUEUE


80 col wrap

MadCoder · 2017-01-26T17:07:24Z

src/queue.c

@@ -719,7 +720,7 @@ _dispatch_root_queues_init_workq(int *wq_supported)
 		result = !r;
 	}
 #endif // HAVE_PTHREAD_WORKQUEUE_SETDISPATCH_NP
-#if DISPATCH_USE_LEGACY_WORKQUEUE_FALLBACK || DISPATCH_USE_PTHREAD_POOL
+#if DISPATCH_USE_LEGACY_WORKQUEUE_FALLBACK || DISPATCH_USE_PTHREAD_POOL || HAVE_INTERNAL_PTHREAD_WORKQUEUE


MadCoder · 2017-01-26T17:07:54Z

src/queue.c

+#if HAVE_INTERNAL_PTHREAD_WORKQUEUE
+			if (!disable_wq && qc->dgq_wq_priority != WORKQ_PRIO_INVALID) {
+				int priority = qc->dgq_wq_priority;
+				int overcommit = qc->dgq_wq_options & WORKQ_ADDTHREADS_OPTION_OVERCOMMIT;


MadCoder · 2017-01-26T17:08:34Z

src/workqueue/workqueue.c

@@ -0,0 +1,849 @@
+/*
+ * Copyright (c) 2008-2017 Apple Inc. All rights reserved.


that 2008 looks incorrect

MadCoder · 2017-01-26T17:09:20Z

src/workqueue/workqueue.c

+ */
+
+
+/////////////////////////////////


we never use //// like that, the rule is /* */ for things outside of functions and // comment when inline inside a function

Here what you mean is:

#pragma mark - ...

which is an Xcode thing but is easy to wire in vim ;)

MadCoder · 2017-01-26T17:24:22Z

src/workqueue/workqueue.c

+
+	if (kext_active) {
+		_dispatch_spawn_thread(dispatch_thread_monitor_main_kext);
+	} else


just do an early return and remove the weird } else and indent below.

MadCoder · 2017-01-26T17:24:30Z

src/workqueue/workqueue.c

+	if (kext_active) {
+		_dispatch_worker_register_kext();
+		rc = 0;
+	} else


MadCoder · 2017-01-26T17:24:38Z

src/workqueue/workqueue.c

+#if DISPATCH_ENABLE_PWQ_KEXT
+	if (kext_active) {
+		_dispatch_worker_register_kext();
+	} else


(and two times more below)

MadCoder · 2017-01-26T17:25:28Z

src/workqueue/workqueue.c

+			scheduler.registered_workers[last] = 0;
+			scheduler.num_registered_workers--;
+		} else {
+			if (fcntl(fd, F_SETFL, O_NONBLOCK) == 0) {


doesn't linux support O_NONBLOCK as an argument to open() these days?

MadCoder · 2017-01-26T17:25:50Z

src/workqueue/workqueue.c

+		return NULL;
+	}
+
+	while(true) {


we tend to use for (;;) { for this

MadCoder · 2017-01-26T17:33:55Z

Actually I think re-reading myself I like dispatch_workq_* better than dispatch_wq_. That way, workq refer to the C module, and workqueue spelled out to a specific thread / execution context.

a workq_item is what a workqueue (thread) runs.

a workq_pool is the pool of work queues.

etc...

MadCoder · 2017-01-26T17:35:24Z

src/workqueue/workqueue_internal.h

+typedef struct _dispatch_workqueue * pthread_workqueue_t;
+
+/* Work queue priority attributes. */
+#define WORKQ_HIGH_PRIOQUEUE             0


I'm not 100% sure that's the right interface, I think we should build something on QoS instead of the legacy interface.

dgrove-oss · 2017-01-26T23:00:20Z

I pushed a revision that tries to address most of the review comments. The main things I know I didn't do in this revision are (a) dealing with the code duplication in the normal/overcommit worker main functions (b) the legacy API (6 priority, not QoS) and (c) some of the overuse of 'queues'.

Stepping back, I think the next big step after this PR is to try to move forward to an API where we avoid the extra level of indirection of having the workq level allocating its own workitems that are just wrapping a root_queue_drain function on a dq and have workq go after the root_queues more directly. If we can do that, than the duplicated code changes quite a bit (and maybe mostly disappears). I can take a look in more depth, but I'm tempted to leave the duplication alone since I'm hoping that in a week or so all of that code is different anyways. But, if you disagree I don't mind attempting a cleanup to cut down on the duplication tomorrow.

hassila · 2017-01-27T15:34:50Z

Integration of the workqueue implementation
into the dispatch code base simplifies the process of
evolving the APIs between the two layers and thus
prepares for future optimization and enhancements

Just a short comment based on some previous experience here (just as food for thought for such future optimisations and enhancements): it would be nice to let e.g. a class reserve the space needed for enqueueing it, to avoid potential memory allocations both in dispatch and in the work queue code - it would be helpful for certain interesting use cases.

Anyway, nice to see this WIP!

dgrove-oss · 2017-01-27T22:28:20Z

src/workqueue/workqueue.c

+	int priority;
+	bool overcommit;
+} dispatch_workq_workqueue_s, *dispatch_workq_workqueue_t;
+


I'm not thrilled with the name dispatch_workq_workqueue_t, but am at a bit of a loss for anything better. Fortunately, this should be a temporary type. If we can eliminate the dispatch_workq_item_t, we won't need this type either (we won't need to have queues of dispatch_workq_item_t's).

I would call this a dispatch_workq_thread_context_t or even dispatch_workq_context_t.

Wait no I'm confused that's the things queued at the bottom for the picking. hmmm. I'm quite unsure why you need that type, I'll have pay attention to the algorithm in depth and I haven't had that kind of time quite yet so let's ignore my comments ;)

went with dispatch_workq_context_t for now (missed the edited comment).

I don't expect to actually need this type in the long run. For the initial merge, I was staying pretty close to the libpwq structure which has a redundant layer of queuing. I was planning on eliminating that redundancy in a second round of changes. We can talk on Monday about whether this plan makes sense or if we should jump to something closer to the final structure in one big step.

dgrove-oss · 2017-02-22T17:45:03Z

I'll work on rebasing this PR to current master and updating to use dispatch_priority_t tomorrow. Then we can figure out what whether it makes sense to merge something like this in the near future, or if we want to go directly to a more optimized version that eliminates the extra level of workitems/queuing that were present in libpwq and replicated here.

In either case, I don't think we should rush this into Swift 3.1.0 (maybe 3.1.1 if there is one).

MadCoder · 2017-02-22T17:49:37Z

I think it's too late for Swift 3.1. My experience for a thread pool is that an evil you know is better than brand new code, we should though start right away for Swift 4. and if there's a swift 3.1+ then we can always backport it at that time.

dgrove-oss · 2017-02-24T23:07:50Z

I did rebase this to post epoll-merge master today. But, unless we are in a rush to eliminate the libpwq dependency, I think it probably makes sense for me to rework to non-legacy APIs and try to eliminate the extra level of workitems/queuing before you review further.

MadCoder · 2017-02-26T07:21:54Z

Agreed.

MadCoder · 2017-02-26T07:30:14Z

oh and btw, now that we have src/event/* the workqueue code should go into src/event/workqueue.c and have if needed a src/event/workqueue_internal.h header

the eventing and threading subsystem are quite tied together, and that puts everything that is strongly tied to the kernel in its own place.

src/*.c is really meant for the libdispatch API surface and pure userland constructs as much as possible.

dgrove-oss · 2017-03-03T23:01:49Z

@MadCoder, I've done a rework to something that I think is more plausible. Much less new code, and the extra level of queuing/workitems inherited from libpwq is gone. The oversubscription heuristics may still need tweaking to maximize performance especially at small core counts (I'm running some experiments over the weekend), but I think the approach/algorithms are ready for review when you have time.

The basic idea is to use the pthread_root_queue functionality already in libdispatch augmented with new code that periodically determines how many worker threads are actually runnable. If it looks like we are under target, then additional workers are gradually created to modestly oversubscribe the thread pool.

The main changes in queue.c are making dgq_thread_pool_size signed to enable using negative values to encode the degree of oversubscription and some other minor adjustments.

As written, it is only monitoring/oversubscribing the default non-overcommit queue. But, if the approach is acceptable and we want to be able to oversubscribe the other non-overcommit queues the approach should generalize ok.

MadCoder · 2017-03-03T23:25:14Z

@dgrove-oss sounds nice, I won't likely have the time to do the review it requires before mid-next week, but I'll try to skim through it over the week-end.

dgrove-oss · 2017-04-18T15:52:06Z

@MadCoder, I got busy with a few other things for the last month, but I have time again for the next few weeks. If you could find time to review, I'd really like to push through the libpwq removal work so we have plenty of time to shake out any problems before the next release.

Thanks!

MadCoder · 2017-04-18T16:21:00Z

I was very busy too but should have some bandwidth early next week, I'll have a closer look then

MadCoder

On top of all the comments I made, I see several issues with that branch that need addressing IMO

we need one regulator per priority "bucket" and maybe even a global one (full bucket isolation was always quite the mistake in dispatch's initial design IMO)
you're not taking whether thread requests are in flight into account to maintain concurrency and other places, which will cause idle processes to burn CPU for no good reason. the whole monitor/regulartor logic should use it more. I see that you shortcut starting new threads if ! _dispatch_workq_root_queue_has_work() but that is not sufficient and is I think racy. I need to think about that more. It kind of goes with the notion I have that the queue.c code should know when the root queue implements a real workqueue bucket and do something special with the workqueue.c code when the thing becomes not empty. In any case the timer that goes with it should be suspended if there's no work and restart if there is and there is no thread creation in flight.
which segues into the fact that you need to account for threads being created and pre-post thread creation when you do it and account for these until you observe them registering with the monitor. Else if thread creation is stalling for any reason (and given how expensive that is it has tons of reasons to) you will end up with thread explosion. A huge lesson we learned in dispatch is that the thread pool should not create new threads in a given bucket until it has observed that thread reach userland. You need to implement this policy.

MadCoder · 2017-04-19T15:06:33Z

src/Makefile.am

@@ -77,6 +84,7 @@ AM_OBJCFLAGS=$(DISPATCH_CFLAGS) $(CBLOCKS_FLAGS)
 AM_CXXFLAGS=$(PTHREAD_WORKQUEUE_CFLAGS) $(DISPATCH_CFLAGS) $(CXXBLOCKS_FLAGS)
 AM_OBJCXXFLAGS=$(DISPATCH_CFLAGS) $(CXXBLOCKS_FLAGS)

+if !HAVE_INTERNAL_PTHREAD_WORKQUEUES


nit: I think I'd move that up as an else

may not be relevant for very long if we move to cmake long term anyway

MadCoder · 2017-04-19T15:07:33Z

src/shims.h

@@ -42,8 +42,12 @@
 #if __has_include(<pthread/workqueue_private.h>)
 #include <pthread/workqueue_private.h>
 #else
+#if HAVE_INTERNAL_PTHREAD_WORKQUEUE


nit: #elif ?

MadCoder · 2017-04-19T15:09:31Z

src/queue.c

 			pthread_workqueue_t dgq_kworkqueue;
 #endif
 #endif // HAVE_PTHREAD_WORKQUEUES
 #if DISPATCH_USE_PTHREAD_POOL
 			void *dgq_ctxt;
-			uint32_t volatile dgq_thread_pool_size;
+			int32_t volatile dgq_thread_pool_size;


ok I think I get it, your "oversubscribe" thing is about pretending the threads don't exist because they are blocked, so you need to allow negative numbers, makes sense.

MadCoder · 2017-04-19T15:10:21Z

src/queue.c

-			(uint8_t)(flags & ~_DISPATCH_PTHREAD_ROOT_QUEUE_FLAG_POOL_SIZE) : 0;
+	uint8_t pool_size_u = flags & _DISPATCH_PTHREAD_ROOT_QUEUE_FLAG_POOL_SIZE ?
+			(int8_t)(flags & ~_DISPATCH_PTHREAD_ROOT_QUEUE_FLAG_POOL_SIZE) : 0;
+	int32_t pool_size = (int32_t)(uint32_t)pool_size_u;


you don't need the double cast, uint8_t promotes naturally to any 32bit integer type (signed or not)

MadCoder · 2017-04-19T15:10:36Z

src/queue.c

@@ -2071,6 +2077,12 @@ dispatch_pthread_root_queue_copy_current(void)
 	return (dispatch_queue_t)_os_object_retain_with_resurrect(dq->_as_os_obj);
 }

+int32_t
+_dispatch_pthread_root_queue_thread_pool_size(dispatch_queue_t dq) {


brace on the next line please

MadCoder · 2017-04-19T15:47:38Z

src/event/workqueue.c

+bool
+dispatch_workq_worker_register(dispatch_queue_t root_q)
+{
+	if (root_q != _dispatch_workq_get_default_root_queue()) {


why? it feels like you want to do this for any root queue that is in the the "root queue array" as in _dispatch_is_in_root_queues_array() and have several monitors (one per bucket). I would also make this check the responsibility of the caller since you hook that into pthread root queues and these should know what kind of thing they implement (custom pthread root queues, or the backend for a workqueue pool)

MadCoder · 2017-04-19T15:48:32Z

src/queue.c

@@ -5475,6 +5517,10 @@ _dispatch_worker_thread(void *context)
 	(void)dispatch_assume_zero(r);
 	_dispatch_introspection_thread_add();

+#if HAVE_INTERNAL_PTHREAD_WORKQUEUE
+	bool registered = dispatch_workq_worker_register(dq);


as I commented in the .c I think this should only be called if dq is a workqueue root queue, this code should (and has to) know which kind of thing it's implementing rather than doing spurious calls here IMO

MadCoder · 2017-04-19T15:50:11Z

src/event/workqueue.c

+	}
+
+	_dispatch_workq_count_runnable_workers(mgr);
+	int32_t count = _dispatch_pthread_root_queue_thread_pool_size(dq);


given that you cheat with that count when you "over-subscribe" and that it is inherently an atomic value that is constantly moving, I'd really only use things that the monitor knows from under the lock

IOW that count should only be about "do not go above the max concurrency" in the pthread root queue code, but not used for logic here at all.

(it goes with my general review remark about the fact you need to count threads in a pre-posted way anyway to avoid in-flight thread explosion)

MadCoder · 2017-04-19T15:55:07Z

src/event/workqueue.c

+	_dispatch_debug("workq: %s is non-empty with pool count %d (%d runnable)",
+			dq->dq_label, count,  mgr->runnable_workers);
+
+	if (mgr->runnable_workers < mgr->target_runnable_workers) {


you want to do that only if there are pending thread requests

MadCoder · 2017-04-19T16:06:20Z

src/queue.c

 			_dispatch_root_queue_debug("pthread pool is full for root queue: "
 					"%p", dq);
 			return;
 		}
-		j = i > t_count ? t_count : i;
+		j = (int32_t)i > t_count ? t_count : (int32_t)i;


please avoid this cast by making i signed, which will mean fixing _dispatch_global_queue_poke to take an int too, all the way down to the root of that need which is da_thr_cnt. this is fine as dispatch really only can manage in the thousands of threads, so making this signed is not an issue, and that would avoid these ugly looking casts

dgrove-oss · 2017-04-19T17:37:01Z

Thanks Pierre! I'll work through the detailed comments; on first read I think I understand your points.

On the overall comments, I agree we need to have a regulator for all the priority buckets. Since it looks like the overall approach is plausible, I'll go ahead and do that as part of the next revision. I was stalling on tackling all the buckets to make sure that trying to use dispatch's pthread_root_queue code wasn't going to be a non-starter.

On overall points 2 & 3, I agree there are timing holes in the code where it could easily create extra threads (or be slow by 1 timer invocation of the monitor to notice that the system was totally idle due to blocked workers and needed to be kicked). The libpwq code did do the more complicated in-flight accounting you described to contain thread explosions. I should be able to do something similar.

MadCoder · 2017-04-19T17:51:14Z

FWIW for the regulators, overcommit don't need one, the root queue implementation as it is is "good enough". the regulator is only really meaningful for non overcommit buckets.

however the overcommit workqueue threads may want to be accounted for globally to have a global thread limit (like Darwin has) that spans several buckets.

Our belief today TBH is that the limit for threads and concurrency should encompass any thread that is at a higher priority. Meaning that if you already are using NCPU runnable threads at the highest priority, other non overcommit buckets should have nothing.

The current approach you're taking doesn't allow to do that, but I wouldn't try to do it at the first try either, I'm just explaining what we believe to be the right direction overall should it influence some choices you're taking.

dgrove-oss · 2017-04-21T22:34:08Z

I've made a pass through and handled almost all of the review comments. The open items are:

Adding the counting of pending thread creates to avoid an explosion of thread creations. I had a "plumbing" question for this. One natural way to do this would be to add a pending creations count to the dispatch_root_queue_context_s when HAVE_INTERNAL_PTHREAD_WORKQUEUE is defined. The thing that is slightly awkward is that this type is only visible inside of queue.c. Should I go ahead and add the counter there and then expose a utility method to workqueue to check to see if there are pending thread creations on the monitored queue, or can you think of a better way to structure this?
Undoing the oversubscription of a pool when blocked threads come back. (ie, backing off when it looks like we have way too many runnable workers). libpwq handled this by injecting a synthetic workitem into the queue that caused the thread that got it to return from a routine that was similar in function to _dispatch_root_queue_drain_one. This gave the regulator a chance to tell the thread to suspend itself without adding logic to the hot path. Right now, there is no backoff in the code (as long as the queue is non-empty, all of the workers keep chugging along). Should I pursue something like the libpwq approach of injecting a workitem into the overloaded queue, or do you have another approach you think would be better?

Thanks!

dgrove-oss · 2017-04-22T01:49:08Z

I want to test more thoroughly on Monday, but I realized the pending thread logic was actually already in place for the pthread_root_queues. I just needed to add the right cmpxchg on dgq_pending in the new oversubscribe routine and it should just work.

MadCoder · 2017-04-22T03:33:42Z

I think that the pthread root queues should have all it takes, and if we need more, we can certainly move the extra struct definition that is stuck at the end of a dispatch_queue_s to workqueue_internal.h if that helps with things, to me "workqueue" is "below" src/queue.c in the same way event/event* is below source.c, so having workqueue.c call into queue.c is inverted layering.

However what you're doing so far feels ok to me.

I had a hunch that dgq_pending has what you need, and I expect that when you give back threads if you're above the original limit you can just cap the counter back.

IOW if the limit was 10, and you oversubscribed by say 5, so you have 15 threads at the moment, as they go back, eventually the count will become 10, then 11, then ... then 15, well you want to drop anything that's above the limit of 10 (which may need you to keep a copy of that original limit if it doesn't already exist as a thing).

hassila · 2017-04-22T09:12:53Z

Just a short comment on 2 for your consideration - not regarding the mechanism used for removing the threads, but rather the decision logic on when to remove them and how quickly.

The implementation we did in libpwq used hysteresis for the backoff of threads to avoid degenerate behaviour when you have workloads that ramp up/down the thread count within fairly short time spans.

We just had a simple aggregation of the number of threads that were oversubscribed per time unit and had a heuristic of "x idle-thread-seconds" accumulated required for backoff of one thread- in practice it meant that if you had many extra threads idling, we would more aggressively deallocate them in the beginning and then tapering off as we got closer to the desired number.

Essentially a similar model used for heat system regulation (accumulating the delta between actual and desired temperatures over time and taking action due to certain thresholds).

This approach essentially smoothed out the thread creation/deallocation problems we observed for certain workloads (initially we just naively deallocated threads immediately when there was a surplus) and it proved quite good in practice - hope the explanation makes sense.

dgrove-oss · 2017-04-27T12:04:32Z

@hassila thanks for the description of libpwq's control strategy; makes sense

dgrove-oss · 2017-04-27T12:16:44Z

@MadCoder I've reworked the code and it is ready for you to take another look.

In addition to the localized fixes, I've made some adjustments to how the workq_monitor system interacts with the global queues to address the oversubscription/pending thread create issue. As a result, the delta got smaller and I think easier to understand/maintain. The key change is that I added a 'floor' parameter to _dispatch_global_queue_poke enabling it to be used both for "normal" poke operations (floor of 0) and for oversubscription requests (floor < 0) thus funneling all the thread creation requests through a single code path.

Right now, the only mechanism for reducing the thread pool size is the 5 second timeout on idle threads. Looking through the code in queue.c, I was wondering if a possible approach would be to adapt the HAVE_PTHREAD_WORKQUEUE_NARROWING path to also work on Linux with the workq_monitor (if the monitor sees overload, it would post a narrow request)? Unless you think it is critical to get this support into the initial changes, I'd be inclined to handle this in a follow up change.

dgrove-oss · 2017-05-02T13:26:15Z

@MadCoder, would be great if you could take a look at the current proposed changes this week and let me know if there is more that is needed before it can be tested/merged. I would like to get this (and ideally the followup changes to modernize priority/QoS API on Linux) into the Swift 4 beta. Thanks!

MadCoder · 2017-05-02T16:31:05Z

I don't think it's urgent to handle the HAVE_PTHREAD_WORKQUEUE_NARROWING case for now

das · 2017-05-17T06:47:49Z

configure.ac

 )
+AM_CONDITIONAL(HAVE_INTERNAL_PTHREAD_WORKQUEUES, $have_internal_pthread_workqueues)


DISPATCH_USE_INTERNAL_WORKQUEUE as well ? (also avoids the confusing singular/plural difference between the define and the conditional)

das · 2017-05-17T06:59:50Z

src/queue.c

@@ -32,6 +32,7 @@
 #endif
 #if HAVE_PTHREAD_WORKQUEUES && (!HAVE_PTHREAD_WORKQUEUE_QOS || DISPATCH_DEBUG) && \
 		!HAVE_PTHREAD_WORKQUEUE_SETDISPATCH_NP && \
+		!HAVE_INTERNAL_PTHREAD_WORKQUEUE && \


I'm not very clear on why HAVE_PTHREAD_WORKQUEUES is true for the internal workqueue case, that doesn't seem quite right, that define is intended to guard the use of the specific pthread_workqueue API being used (which the internal version doesn't emulate). It would be easier to understand if the cases where you need something that is guarded by HAVE_PTHREAD_WORKQUEUES just added a || HAVE_INTERNAL_PTHREAD_WORKQUEUE instead (and then you wouldn't need the confusing addition here)

Alternatively, could add a new DISPATCH_USE_WORKQUEUES at the top here that guards the specific bits that are needed for both HAVE_PTHREAD_WORKQUEUES && HAVE_INTERNAL_PTHREAD_WORKQUEUE

resp s/HAVE_INTERNAL_PTHREAD_WORKQUEUE/DISPATCH_USE_INTERNAL_WORKQUEUE/ as discussed elsewhere

das · 2017-05-17T07:01:32Z

src/queue.c

@@ -635,7 +639,8 @@ _dispatch_root_queues_init_workq(int *wq_supported)
 		result = !r;
 	}
 #endif // HAVE_PTHREAD_WORKQUEUE_SETDISPATCH_NP
-#if DISPATCH_USE_LEGACY_WORKQUEUE_FALLBACK || DISPATCH_USE_PTHREAD_POOL
+#if DISPATCH_USE_LEGACY_WORKQUEUE_FALLBACK || DISPATCH_USE_PTHREAD_POOL || \
+		HAVE_INTERNAL_PTHREAD_WORKQUEUE


shouldn't be necessary, you already have DISPATCH_USE_PTHREAD_POOL defined

das · 2017-05-17T07:11:29Z

src/queue.c

+			// because the fastpath of _dispatch_global_queue_poke didn't
+			// know yet that we're using the internal pool implementation
+			// we have to undo its setting of dgq_pending
+			qc->dgq_pending = 0;


don't you want to do this only if !pwq ? (to match the check in _dispatch_global_queue_poke). Otherwise you'll wipe out the pending info for the legacy fallback above

this is really the only one of my comments that requires fixing, the rest of my comments are more about fitting in better with the existing configuration/define scheme and some style nits that we can also address later if necessary

das · 2017-05-17T07:30:57Z

src/queue.c

@@ -5475,6 +5509,15 @@ _dispatch_worker_thread(void *context)
 	(void)dispatch_assume_zero(r);
 	_dispatch_introspection_thread_add();

+#if HAVE_INTERNAL_PTHREAD_WORKQUEUE
+	bool overcommit = qc->dgq_wq_options & WORKQ_ADDTHREADS_OPTION_OVERCOMMIT;
+	bool manager = dq == &_dispatch_mgr_root_queue;


please add ()

das · 2017-05-17T07:37:12Z

a few minor comments but overall the additions to the existing code look great to me!

das · 2017-05-17T07:40:04Z

src/queue.c

+			// because the fastpath of _dispatch_global_queue_poke didn't
+			// know yet that we're using the internal pool implementation
+			// we have to undo its setting of dgq_pending
+			qc->dgq_pending = 0;


this is really the only one of my comments that requires fixing, the rest of my comments are more about fitting in better with the existing configuration/define scheme and some style nits that we can also address later if necessary

dgrove-oss · 2017-05-17T15:29:04Z

Thanks! I guarded the clearing of qc->dgq_pending, renamed to DISPATCH_USE_INTERNAL_WORKQUEUE, and added () at lines 5513 and 5514. Squashed back into single commit.

I'd like to clean up HAVE_PTHREAD_WORKQUEUES being defined when using the internal workqueues in a follow up PR. I think it would make sense to look at this and switching to more of a QoS based priority definition (Pierre's comment about the priority levels being defined in workqueue_internal.h) as a single change.

das · 2017-05-17T18:31:55Z

looks great, thanks!

cleaning up the conflation with HAVE_PTHREAD_WORKQUEUES as a followup makes sense to me

das · 2017-05-17T18:35:45Z

@swift-ci please test

MadCoder · 2017-05-17T19:10:43Z

@swift-ci please test

dgrove-oss · 2017-05-17T19:56:03Z

Looks like the build got into a confused state about whether libpwq was included or not. I'm going to try pulling in dgrove-oss@626f8a8 which removes libpwq entirely and see if that is enough. If it isn't, we might need to figure out how to force a clean workspace on the build machines.

dgrove-oss · 2017-05-17T20:41:46Z

@swift-ci please test

MadCoder · 2017-05-17T20:42:23Z

@swift-ci please test

dgrove-oss · 2017-05-17T21:35:45Z

Yeah! Build worked. Do you want me to squish to 1 commit, or leave as two?

MadCoder · 2017-05-17T23:02:40Z

please squash it all and I'll merge

Provide an implementation of the pthread_workqueue functionality (libpwq) as a fully integrated component of libdispatch. Integration of the workqueue implementation into the libdispatch code base simplifies the process of evolving the APIs between the two layers and thus prepares for future optimization and enhancements. The overall design is to augment libdispatch's existing pthread_root_queue option with periodic user-space monitoring of the status of a root queue's worker threads and to oversubscribe the workqueue when not enough workers are observed to be runnable when the queue has available work. Initially, the integrated pthread_workqueue is only enabled by default on Linux. The current monitoring implementation relies on Linux-specific code to dynamically estimate the number of runnable worker threads by reading /proc. Porting the monitoring code to non-Linux platforms would entail providing similar functionality for those platforms. Remove libpwq git submodule and autotools support for building libpwq from source as part of building libdispatch.

dgrove-oss · 2017-05-18T12:16:14Z

squashed; ready to merge. Thanks!

das · 2017-05-19T03:45:26Z

this causes build failures on Darwin due to numerous unsigned vs signed build warnings (which we have enabled as fatal) from the uint32_t -> int32_t change.
I am fixing as part of an upcoming source drop

implement pthread_workqueue within libdispatch Signed-off-by: Daniel A. Steffen <[email protected]>

MadCoder suggested changes Jan 26, 2017

View reviewed changes

MadCoder reviewed Jan 26, 2017

View reviewed changes

dgrove-oss commented Jan 27, 2017

View reviewed changes

dgrove-oss force-pushed the internal-pwq-impl branch from cf4c265 to 99b8625 Compare February 24, 2017 18:49

dgrove-oss force-pushed the internal-pwq-impl branch from fe6185a to 534dd0a Compare March 2, 2017 22:40

dgrove-oss force-pushed the internal-pwq-impl branch from 0ea540c to 12f6f01 Compare March 9, 2017 13:34

dgrove-oss force-pushed the internal-pwq-impl branch from 2fb99da to e80a9dd Compare April 18, 2017 15:45

MadCoder suggested changes Apr 19, 2017

View reviewed changes

das reviewed May 17, 2017

View reviewed changes

das requested changes May 17, 2017

View reviewed changes

dgrove-oss force-pushed the internal-pwq-impl branch 2 times, most recently from af4a6d9 to 8b235c5 Compare May 17, 2017 15:20

das approved these changes May 17, 2017

View reviewed changes

MadCoder approved these changes May 17, 2017

View reviewed changes

dgrove-oss force-pushed the internal-pwq-impl branch from 897ab1a to 703b1d5 Compare May 18, 2017 12:02

MadCoder merged commit d6eb245 into swiftlang:master May 18, 2017

MadCoder added the darwin pending label May 18, 2017

dgrove-oss deleted the internal-pwq-impl branch May 19, 2017 13:46

das pushed a commit that referenced this pull request May 25, 2017

Merge pull request #206 from dgrove-oss/internal-pwq-impl

624ae1d

implement pthread_workqueue within libdispatch Signed-off-by: Daniel A. Steffen <[email protected]>

das mentioned this pull request May 25, 2017

Merge tag darwin/darwin-008 on darwin/trunk. #252

Merged

das removed the darwin pending label May 25, 2017

		@@ -0,0 +1,849 @@
		/*
		* Copyright (c) 2008-2017 Apple Inc. All rights reserved.

		)
		AM_CONDITIONAL(HAVE_INTERNAL_PTHREAD_WORKQUEUES, $have_internal_pthread_workqueues)

		*/


		/////////////////////////////////

implement pthread_workqueue within libdispatch #206

implement pthread_workqueue within libdispatch #206

Uh oh!

Conversation

dgrove-oss commented Jan 26, 2017

Uh oh!

MadCoder left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadCoder commented Jan 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dgrove-oss commented Jan 26, 2017

Uh oh!

hassila commented Jan 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadCoder Jan 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dgrove-oss commented Feb 22, 2017

Uh oh!

MadCoder commented Feb 22, 2017

Uh oh!

dgrove-oss commented Feb 24, 2017

Uh oh!

MadCoder commented Feb 26, 2017

Uh oh!

MadCoder commented Feb 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dgrove-oss commented Mar 3, 2017

Uh oh!

MadCoder commented Mar 3, 2017

Uh oh!

dgrove-oss commented Apr 18, 2017

Uh oh!

MadCoder commented Apr 18, 2017

Uh oh!

MadCoder left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

MadCoder Jan 27, 2017 •

edited

Loading

MadCoder commented Feb 26, 2017 •

edited

Loading

dgrove-oss commented Apr 27, 2017 •

edited

Loading