1 Introduction
In the classic secretary problem, items appear in random order. We know , but don’t know the value of an item until it appears. Once an item arrives we have to irrevocably and immediately decide whether or not to select it. Only one item is allowed to be selected, and the objective is to select the most valuable item, or perhaps to maximize the expected value of the selected item [10, 14, 22]. It is well known that the optimal policy is to observe the first items without making any selection and then select the first item whose value is larger than the value of the best item in the first items [10]. This algorithm, given by Dynkin [10]
, is asymptotically optimal, and hires the best secretary with probability at least
. Hence it is also competitive for the expected value of the chosen item, and it can be shown that no algorithm can beat competitive ratio in expectation. Many variants and generalizations of the secretary problem have been studied in the literature, see e.g., [2, 30, 28, 31, 20, 3]. [20, 3] introduced a multiple choice secretary problem, where the goal is to select items in a randomly ordered input so as to maximize the sum of their values; and Kleinberg [20] gave an algorithm with an asymptotic competitive ratio of . Thus as , the competitive ratio approaches 1. Recent literature studied several generalizations of this setting to multidimensional knapsacks [24], and proposed algorithms for which the expected online solution approaches the best offline solution as the knapsack sizes becomes large (e.g., [12, 9, 1]). In another variant of multiplechoice secretary problem, Bateni et al. [5] and Gupta et al. [15] introduce the submodular secretary problem. In this secretary problem, the algorithm again selects items, but the value of the selected items is given by a monotone submodular function . The algorithm has a value oracle access to the function, i.e., for any given set , an algorithm can query an oracle to find its value [29]. The algorithm can select at most items , from a randomly ordered sequence of items. The goal is to maximize . Currently, the best result for this setting is due to Kesselheim and Tönnis [19], who achieve a competitive ratio in exponential time, or in polynomial time. In this case, the offline problem is NPhard and hardto approximate beyond the factor of achieved by the greedy algorithm [25]. However, it is unclear if a competitive ratio of can be achieved by an online algorithm for the submodular secretary problem even when is large.Our model: secretary problem with shortlists.
In this paper, we consider a relaxation of the secretary problem where the algorithm is allowed to select a shortlist of items that is larger than the number of items that ultimately need to be selected. That is, in a multiplechoice secretary problem with cardinality constraint , the algorithm is allowed to choose more than items as part of a shortlist. Then, after seeing the entire input, the algorithm can choose a subset of size from the bigger set of items in the shortlist. This new model is motivated by some practical applications of secretary problems, such as hiring (or assignment problems), where in some cases it may be possible to tentatively accept a larger number of candidates (or requests), while deferring the choice of the final selections to after all the candidates have been seen. Since there may be a penalty for declining candidates who were part of the shortlist, one would prefer that the shortlist is not much larger than . Another important motivation is theoretical: we wish to understand to what extent this relaxation of the secretary problem can improve the achievable competitive ratio. This question is in the spirit of several other methods of analysis that allow an online algorithm to have additional power, such as resource augmentation [17, 27]. The potential of this relaxation is illustrated by the basic secretary problem, where the aim is to select the item of maximum value among randomly ordered inputs. There, it is not difficult to show that if an algorithm picks every item that is better than the items seen so far, the true maximum will be found, while the expected number of items picked under randomly ordered inputs will be . Further, we show that this approach can be easily modified to get the maximum with probability while picking at most items for any constant . Thus, with just a constant size shortlist, we can break the barrier for the secretary problem and achieve a competitive ratio that is arbitrarily close to ! Motivated by this observation, we ask if a similar improvement can be achieved by relaxing the submodular secretary problem to allow a shortlist. That is, instead of choosing items, the algorithm is allowed to chose items as part of a shortlist, for some function ; and at the end of all inputs, the algorithm chooses items from the selected items. Then, what is the relationship between and the competitive ratio for this problem? Can we achieve a solution close to the best offline solution when is not much bigger than , for example when ? In this paper, we answer this question affirmatively by giving a polynomial time algorithm that achieves competitive ratio for the submodular secretary problem using a shortlist of size . This is surprising since is the best achievable approximation (in polynomial time) for the offline problem. Further, for some special cases of submodular functions, we demonstrate that an shortlist allows us to achieve a competitive ratio. These results demonstrate the power of (small) shortlists for closing the gap between online and offline (polynomial time) algorithms. We also discuss connections of secretary problem with shortlists to the related streaming settings. While a streaming algorithm does not qualify as an online algorithm (even when a shortlist is allowed), we show that our algorithm can in fact be implemented in a streaming setting to use memory buffer; and our results significantly improve the available results for the submodular random order streaming problem.
1.1 Problem Definition
We now give a more formal definition. Items from a set (pool of items) arrive in a uniformly random order over sequential rounds. The set is apriori fixed but unknown to the algorithm, and the total number of items is known to the algorithm. In each round, the algorithm irrevocably decides whether to add the arriving item to a shortlist or not. The algorithm’s value at the end of rounds is given by
where is a monotone submodular function. The algorithm has value oracle access to this function. The optimal offline utility is given by
We say that an algorithm for this problem achieves a competitive ratio using shortlist of size , if at the end of rounds, and . Given the shortlist , since the problem of computing the solution can itself be computationally intensive, our algorithm will also track and output a subset . We will lower bound the competitive ratio by bounding . The above problem definition has connections to some existing problems studied in the literature. The wellstudied online submodular secretary problem described earlier is obtained from the above definition by setting , i.e., it is same as the case when no extra items can be selected as part of a shortlist. Another related problem is submodular random order streaming problem studied in [26]. In this problem, items from a set arrive online in random order and the algorithm aims to select a subset in order to maximize . The streaming algorithm is allowed to maintain a buffer of size . However, this streaming problem is distinct from the submodular secretary problem with shortlists in several important ways. On one hand, since an item previously selected in the memory buffer can be discarded and replaced by a new items, a memory buffer of size does not imply a shortlist of size at most . On the other hand, in the secretary setting, we are allowed to memorize/store more than items without adding them to the shortlist. Thus an algorithm for submodular secretary problemwith shortlist of size may potentially use a buffer of size larger than . Our algorithms, as described in the paper, do use a large buffer, but we will show that the algorithm presented in this paper can in fact be implemented to use only buffer, thus obtaining matching results for the streaming problem.
1.2 Our Results
Our main result is an online algorithm for submodular secretary problem with shortlists that, for any constant , achieves a competitive ratio of with . Note that for submodular secretary problem there is an upper bound of on the achievable aproximation factor, even in the offline setting, and this upper bound applies to our problem for arbitrary size of shortlists. On the other hand for online monotone submodular secretary problem, i.e., when , the best competitive ratio achieved in the literature is [19] Remarkably, with only an size shortlist, our online algorithm is able to achieve a competitive ratio that is arbitrarily close to the offline upper bound of . In the theorem statements below, bigOh notation is used to represent asymptotic behavior with respect to and . We assume the standard value oracle model: the only access to the submodular function is through a black box returning for a given set , and each such queary can be done in time.
Theorem 1.
For any constant , there exists an online algorithm (Algorithm 2) for the submodular secretary problem with shortlists that achieves a competitive ratio of , with shortlist of size . Here, . The running time of this online algorithm is .
Specifically, we have for some constant . The running time of our algorithm is linear in , the size of the input, which is significant as, until recently, it was not known if there exists a linear time algorithm achieving a approximation even for the offline monotone submodular maximization problem under cardinality constraint[23]. Another interesting aspect of our algorithm is that it is highly parallel. Even though the decision for each arriving item may take time that is exponential in (roughly ), it can be readily parallelized among multiple (as many as ) processors. Further, we show an implementation of Algorithm 2 that uses a memory buffer of size at most to get the following result for the problem of submodular random order streaming problem described in the previous section.
Theorem 2.
For any constant , there exists an algorithm for the submodular random order streaming problemthat achieves approximation to OPT while using a memory buffer of size at most . Also, the number of objective function evaluations for each item, amortized over items, is .
The above result significantly improves over the stateoftheart results in random order streaming model [26], which are an approximation ratio of using a memory of size . It is natural to ask whether these lists are, in fact, too powerful. Maybe they could actually allow us to always match the best offline algorithm. We give a negative result in this direction and show that even if we have unlimited computation power, for any function , we can get no better than competitive algorithm using a shortlist of size . Note that with unlimited computational power, the offline problem can be solved exactly. This result demonstrates that having a shortlist does not make the online problem too easy  even with a shortlist (of size ) there is an information theoretic gap between the online and offline problem.
Theorem 3.
No online algorithm (even with unlimited computational power) can achieve a competitive ratio better than for the submodular secretary problem with shortlists, while using a shortlist of size .
Finally, for some special cases of monotone submodular functions, we can asymptotically approach the optimal solution. The first one is the family of functions we call submdular. A function is submodular if it is submodular and there exists a submodular function such that for all :
Theorem 4.
If is an submodular function, there exists an online algorithm for the submodular secretary problem with shortlists that achieves a competitive ratio of with shortlist of size . Here, .
A proof of Theorem 4 along with the relevant algorithm (Algorithm 3) appears in the appendix. Another special case is monotone submodular functions satisfying the following property: , for any and . We can show that the algorithm by Kleinberg [20] asymptotically approaches optimal solution for such functions, but we omit the details.
1.3 Comparison to related work
We compare our results (Theorem 1 and Theorem 2) to the best known results for submodular secretary problem and submodular random order streaming problem, respectively. The best known algorithm so far for submodular secretary problem is by Kesselheim and Tönnis [19], with asymptotic competitive ratio of . In their algorithm, after observing each element, they use an oracle to compute optimal offline solution on the elements seen so far. Therefore it requires exponential time in . The best competitive ratio that they can get in polynomial time is . In comparison, by using a shortlist of size our (polynomial time) algorithm achieves a competitive ratio of . In addition to substantially improves the abovementioned results for submodular secretary problem, this closely matches the best possible offline approximation ratio of in polynomial time. Further, our algorithm is linear time. Table 1 summarizes this comparison. Here, hides the dependence on the constant . The hidden constant in is for some absolute constant .
#selections  Comp ratio  Running time  Comp ratio in poly(n)  

[19]  
this 
In the streaming setting, Chakrabarti and Kale [8] provided a single pass streaming algorithm for monotone submodular function maximization under cardinality constraint, that achieves a approximation under adversarial ordering of input. Further, their algorithm requires function evaluations per arriving item and memory. The currently best known approximation under adversarial order streaming model is by Badanidiyuru et al. [4], who achieve a approximation with a memory of size . There is an upper bound of on the competitive ratio achievable by any streaming algorithm for this problem under adversarial order, while using memory [26]. Hess and Sabato [16] initiated the study of submodular random order streaming problem. Their algorithm uses memory and a total of function evaluations to achieve approximation. The state of the art result in the random order input model is due to NorouziFard et al. [26] who achieve a approximation, while using a memory buffer of size . Table 2 provides a detailed comparison of our result in Theorem 2 to the abovementioned results for submodular random order streaming problem, showing that our algorithm substantially improves the existing results on most aspects of the problem.
Memory size  Approximation ratio  Running time  update time  

[16]  O(1)  
[26]  
[4]  
this  amortized 
There is also a line of work studying the online variant of the submodular welfare maximization problem (e.g., [21, 7, 18]). In this problem, the items arrive online, and each arriving item should be allocated to one of agents with a submodular valuation functions where is the subset of items allocated to th agent). The goal is to partition the arriving items into sets to be allocated to agents, so that the sum of valuations over all agents is maximized. This setting is incomparable with the submodular secretary problem setting considered here.
1.4 Organization
The rest of the paper is organized as follows. Section 2 describes our main algorithm (Algorithm 2) for the submodular secretary problem with shortlists, and demonstrates that its shortlist size is bounded by . In Section 3, we analyze the competitive ratio of this algorithm to prove Theorem 1. In Section 4, we provide an alternate implementation of Algorithm 2 that uses a memory buffer of size at most , in order to prove Theorem 2. Finally, in Section 5, we provide a proof of our impossibility result stated in Theorem 3. The proof of Theorem 4 along with the relevant algorithm appears in the appendix.
2 Algorithm description
Before giving our algorithm for submodular secretary problem with shortlists, we describe a simple technique for secretary problem with shortlists that achieves a competitive ratio for with shortlists of size logarithmic in . Recall that in the secretary problem, the aim is to select an item with expected value close to the maximum among a pool of items arriving sequentially in a uniformly random order. We will consider the variant with shortlists, where we now want to pick a shortlist which contains an item with expected value close to the maximum. We propose the following simple algorithm. For the first rounds, don’t add any items to the shortlist, but just keep track of the maximum value seen so far. For all subsequent rounds, for any arriving item that has a value greater than or equal to the maximum value seen so far, add it to the shortlist if the size of shortlist is less than or equal to . This algorithm is summarized as Algorithm 1. Clearly, for contant , this algorithm uses a shortlist of size . Further, under a uniform random ordering of input, we can show that the maximum value item will be part of the shortlist with probability . (See Proposition 3 in Section 3.)

[leftmargin=0.7in]

number of items , , and

item values , with
where denotes the item in the slot .
There are two main difficulties in extending this idea to the submodular secretary problem with shortlists. First, instead of one item, here we aim to select a set of items using an length shortlist. Second, the contribution of each new item to the objective value, as given by the submodular function , depends on the set of items selected so far. The first main concept we introduce to handle these difficulties is that of dividing the input into sequential blocks that we refer to as windows. Below is the precise construction of windows, for any postivie integers and , such that
is an integer. We use a set of random variables
defined in the following way. Throw balls into bins uniformly at random. Then set to be the number of balls in the th bin. We call the resulting ’s a ballbin random set.Definition 1 ( windows).
Let be a ballbin random set. Divide the indices into slots, where the th slot, , consists of consecutive indices in the natural way, that is, slot contains the first indices, slot contains the next , etc. Next, we define windows, where window consists of consecutive slots, in the same manner as we assigned slots.
Thus, slot is composed of indices , where and . Further, if the ordered the input is then we say that the items inside the slot are To reduce notation, when clear from context, we will use and to also indicate the set of items in the slot and window respectively. When and are large enough constants, some useful properties can be obtained from the construction of these windows and slots. First, roughly items from the optimal set are likely to lie in each of these windows; and further, it is unlikely that two items from will appear in the same slot. (These statements will be made more precise in the analysis where precise setting of in terms of will be provided.) Consequently, our algorithm can focus on identifying a constant number (roughly ) of optimal items from each of these windows, with at most one item coming from each of the slots in a window. The core of our algorithm is a subroutine that accomplishes this task in an online manner using a shortlist of constant size in each window. To implement this idea, we use a greedy selection method that considers all possible sized subsequences of the slots in a window, and aims to identify the subsequence that maximizes the increment over the best items identified so far. More precisely, for any subsequence of the slots in window , we define a ‘greedy’ subsequence of items as:
(1) 
where
(2) 
In (2) and in the rest of the paper, we use shorthand to denote , and to denote , etc. We also will take unions of subsequences, which we interpret as the union of the elements in the subsequences. We also define to be the union of all greedy subsequences of length , and to be the best subsequence among those. That is,
(3) 
and
(4) 
where
(5) 
Note that (refer to (2)) can be set as either an item in slot or an item from a previous greedy subsequence in . The significance of the latter relaxation will become clear in the analysis. As such, identifying the sets and involves looking forward in a slot to find the best item (according to the given criterion in (2)) among all the items in the slot. To obtain an online implementation of this procedure, we use an online subroutine that employs the algorithm (Algorithm 1) for the basic secretary problem described earlier. This online procedure will result in selection of a set potentially larger than , while ensuring that each element from is part of with a high probability at the cost of adding extra items to the shortlist. Note that and can be computed exactly at the end of window . Algorithm 2 summarizes the overall structure of our algorithm. In the algorithm, for any item and set , we define . The algorithm returns both the shortlist which we show to be of size in the following proposition, as well as a set of size at most to compete with . In the next section, we will show that to provide a bound on the competitive ratio of this algorithm.
Proposition 1.
Given , and any constant and , the size of shortlist selected by Algorithm 2 is at most .
Proof.
For each window , and for each of the slots in this window, lines 6 through 7 in Algorithm 2 runs Algorithm 1 for times (for all length subsequences). By construction of Algorithm 1, for each run it will add at most items each time to the shortlist. Therefore, over all windows, Algorithm 2 adds at most items to the shortlist. ∎
3 Bounding the competitive ratio (Proof of Theorem 1)
In this section we show that for any , Algorithm 2 with an appropriate choice of constants , achieves the competitive ratio claimed in Theorem 1 for the submodular secretary problem with shortlists. Recall the following notation defined in the previous section. For any collection of sets , we use to denote . Also, recall that for any item and set , we denote .
Proof overview.
The proof is divided into two parts. We first show a lower bound on the ratio in Proposition 2, where is the subset of items as defined in (4) for every window . Later in Proposition 4, we use the said bound to derive a lower bound on the ratio , where is the subset of shortlist returned by Algorithm 2. Specifically, in Proposition 2, we provide settings of parameters such that of . A central idea in the proof of this result is to show that for every window , given , the items tracked from the previous windows, any of the items from the optimal set has at least probability to appear either in window , or among the tracked items . Further, the items from that appear in window , appear independently, and in a uniformly at random slot in this window. (See Lemma 7.) This observation allows us to show that, in each window, there exists a subsequence of close to slots, such that the greedy sequence of items will be almost “as good as” a randomly chosen sequence of items from . More precisely, denoting , in Lemma 11, for all , we lower bound the increment in function value on adding over the items in as:
We then deduce (using standard techniques for the analysis of greedy algorithm for submodular functions) that
Now, since the length of is close to (as we show in Lemma 13) and since with defined as the “best” subsequence of length (refer to definition of in (5)), we can show that a similar inequality holds for , i.e.,
where depends on the setting of . (See Lemma 15.) Then repeatedly applying this inequality for , and setting appropriately in terms of , we can obtain , completing the proof of Proposition 2 However, a remaining difficulty is that while the algorithm keeps a track of the set for every window , it may not have been able to add all the items in to the shortlist during the online processing of the inputs in that window. In the proof of Proposition 4, we show that in fact the algorithm will add most of the items in to the short list. More precisely, we show that given that an item is in , it will be in shortlist with probability , where is the parameter used while calling Algorithm 1 in Algorithm 2. Therefore, using properties of submodular functions it follows that with , (see Proposition 4). Combining this with the lower bound mentioned earlier, we complete the proof of competitive ratio bound stated in Theorem 1.
3.1 Preliminaries
Lemma 1.
Given a monotone submodular function , and subsets in the domain of , we use to denote . For any set and ,
Lemma 2.
Denote by a random subset of where each element has probability atleast to appear in (not necessarily independently). Then
We will use the following well known deviation inequality for martingales (or supermartingales/submartingales).
Lemma 3 (AzumaHoeffding inequality).
Suppose is a martingale (or supermartingale) and almost surely. Then for all positive integers N and all positive reals ,
And symmetrically (when is a submartingale):
Lemma 4 (Chernoff bound for Bernoulli r.v.).
Let , where with probability and with probability , and all are independent. Let . Then,
for all , and
for all .
3.2 Some useful properties of windows
We first prove some useful properties of windows, defined in Definition 1 and used in Algorithm 2. The first observation is that every item will appear uniformly at random in one of the slots in windows.
Definition 2.
For each item , define as the random variable indicating the slot in which
appears. We call vector
a configuration.Lemma 5.
This follows from the uniform random order of arrivals, and the use of the balls in bins process to determine the number of items in a slot during the construction of windows. A proof is provided in Appendix 6.1. Next, we make important observations about the probability of assignment of items in in the slots in a window , given the sets (refer to (3), (4) for definition of these sets). To aid analysis, we define the following new random variable that will track all the useful information from a window .
Definition 3.
Define , for all length subsequences of the slots in window . Here, is a sequence of items as defined in (1). Also define (Note that ).
Lemma 6.
For any window , and are independent of the ordering of elements within any slot, and are determined by the configuration .
Proof.
Following the above lemma, given a configuration , we will some times use the notation and to make this mapping explicit.
Lemma 7.
For any item , window , and slot in window , define
(6) 
Then, for any pair of slots in windows ,
(7) 
Proof.
If then the statement of the lemma is trivial, so consider . For such , . We show that for any pair of slots , where is a slot in first windows and is a slot in window ,
(8) 
And, for any pair of slots in windows ,
(9) 
To see (8), suppose for a configuration we have and . Since , then by definition of we have that for any length subsequence of slots in any of the windows . Therefore, if we remove from windows (i.e., consider another configuration where is in windows ) then would not change. This is because is not the output of argmax in definition of (refer to (1)) for any , so that its removal will not change the output of argmax. Also by adding to slot , will not change since is not in window . Suppose configuration is a new configuration obtained from by changing from to . Therefore . Also remember that from lemma 19, . This mapping shows that . The proof for (9) is similar. By applying Bayes’ rule to (8) we have
Also from Lemma 5, thus
Now, for any pair of slots in windows , by applying Bayes’ rule to the equation (9), we have . That is, has as much probability to appear in or as any of the other (at most ) slots in windows . As a result . ∎
Lemma 8.
For any window , and , the random variables and are independent. That is, given , items appear in any slot in independently.
Proof.
To prove this, we show that . Suppose is a configuration such that and , and . Assume there exists another feasible slot assignment of , i.e., there is another configuration such that and where . (If no such configuration exists, then is always , and the desired lemma statement is trivially true.) Then, we prove the desired independence by showing that there exists a feasible configuration where slot assignment of is , and is . This is obtained by changing from to in , to obtain another configuration . In Lemma 9, we show that this change will not effect , i.e., . Thus configuration satisfies the desired statement. ∎
Lemma 9.
Fix a slot , , and . Suppose that there exists some configuration such that and . Then, given any configuration with , we can replace with to obtain a new configuration that also satisfies .
Proof.
Suppose the slot lies in window . If then the statement is trivial. So suppose . Create an intermediate configuration by removing the item from , call it . Since we have . In fact, for every subsequence , the greedy subsequence for , will be same as that for , i.e., . Now add item to slot in , to obtain configuration . We claim
Comments
There are no comments yet.