【斜率优化】POJ-3709——K-Anonymous Sequence

前言

之前做了一道“基础的斜率优化板题”(https://blog.csdn.net/qq_36294918/article/details/103641411),

然后信心满满地看着道题时——

啥玩意儿?这式子加加减减的,哪儿来的斜率啊?/摸不着头脑... ...

后来参考了别人的博客才发现——我根本就没有完全理解【斜率优化】好伐...汗...

题目

Description

The explosively increasing network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itself, along with its basic form the degree of nodes, can reveal the identities of individuals.

To address this issue, we study a specific graph-anonymization problem. We call a graph k-anonymous if for every node v, there exist at least k-1 other nodes in the graph with the same degree as v. And we are interested in achieving k-anonymous on a graph with the minimum number of graph-modification operations.

We simplify the problem. Pick n nodes out of the entire graph G and list their degrees in ascending order. We define a sequence k-anonymous if for every element s, there exist at least k-1 other elements in the sequence equal to s. To let the given sequence k-anonymous, you could do one operation only—decrease some of the numbers in the sequence. And we define the cost of the modification the sum of the difference of all numbers you modified. e.g. sequence 2, 2, 3, 4, 4, 5, 5, with k=3, can be modified to 2, 2, 2, 4, 4, 4, 4, which satisfy 3-anonymous property and the cost of the modification will be |3-2| + |5-4| + |5-4| = 3.

Give a sequence with n numbers in ascending order and k, we want to know the modification with minimal cost among all modifications which adjust the sequence k-anonymous.

Input

The first line of the input file contains a single integer T (1 ≤ T ≤ 20) – the number of tests in the input file. Each test starts with a line containing two numbers n (2 ≤ n ≤ 500000) – the amount of numbers in the sequence and k (2 ≤ kn). It is followed by a line with n integer numbers—the degree sequence in ascending order. And every number s in the sequence is in the range [0, 500000].

Output

For each test, output one line containing a single integer—the minimal cost.

Sample Input

2
7 3
2 2 3 4 4 5 5
6 2
0 3 3 4 8 9

Sample Output

3
5

Source

POJ Founder Monthly Contest – 2008.12.28, Rainer

题目大意

将一个升序的,有N个元素的序列分组,要求每组的元素不少于K个

计算出组内各元素与最小元素的之差的和,将每组的这个值加起来,其和要最小

分析

【分析】参考&鸣谢:http://blog.sina.com.cn/s/blog_5f5353cc0100jxxo.html

【代码】参考&鸣谢:https://www.cnblogs.com/wmj6/p/10800045.html

(大佬已经解释得很详细了,我就不赘述了,直接贴解析(~ = ̄ω ̄=)~)


将题目转化下:将一个升序的,有N个元素的序列,分组。要求每组的元素不少于K个,计算出组内各元素与最小元素的之差的和,将每组的这个值加起来,其和要最小。

很容易可以得出一个结论:连续取比离散取得到的结果要好(很容易证,所以不证)。

由以上可得DP方程:

dp[i]=MIN(dp[j]+sum[i]-sum[j]-(i-j)*arr[j+1]); j<i-k+1


开始斜率优化(不考虑每组不少于K个元素):

1.证明较优决策点对后续状态影响的持续性

  证明很简单,不证了,有兴趣的话,参考上一篇文章

2.求斜率方程:一般化为左边是JK,右边是I的形式

  假设J<K,且在K点的决策比J好,则有:

 dp[j]+sum[i]-sum[j]-(i-j)*arr[j+1]>= dp[k]+sum[i]-sum[k]-(i-k)*arr[k+1]

化简得:

dp[j]-dp[k]-sum[j]+sum[k]+j*arr[j+1]-k*arr[k+1]>=i* (arr[j+1]-arr[k+1])

G(k,j)= dp[j]-dp[k]-sum[j]+sum[k]+j*arr[j+1]-k*arr[k+1]

S(k,j)= arr[j+1]-arr[k+1]

则上式化为

G(k,j)>=i*S(k,j)

即G(k,j)/S(k,j)<=i 记住变号,因为S(k,j)<0

X(k,j)= G(k,j)/S(k,j)

所以斜率方程:

X(k,j)<=i

3.规定队列的维护规则

队首维护:

  假设A,B(A<B)是队首元素,X(B,A)<=i,BA,删除A,否则不需维护.

队尾维护:

    假设A,B,C(A<B<C)是队尾元素

a.X(B,A)<=i,X(C,B)<=i,CB,BA

b.X(B,A)<=i,X(C,B)>i,BC,BA,B为极大值

c.X(B,A)>i,AB

 

a,c情况直接删掉B,b情况保留.b情况可改为X(B,A)<X(C,B)

 

好,以下考虑每组不少于K个元素这个限制。

要解决这个限制,只需延迟加入的时机即可。

若延迟K-1个回合加入,有可能使前一组的个数少于K个。

若延迟2*k-1个回合加入,则不会出现这情况。但此时加入的数应是i-k+1(假设是第I回合)

 

特别注意在计算的过程中出现溢出的情况(也就是要开long long),偶就因为没注意到这个,WA了3次。。。


【由"一个在写完代码后发生的小问题"引发的思考】

之前自己写斜率优化的题目时,都习惯性地打:

int head=0,tail=0;
for(int i=len;i<=n;i++)
{
    int now=i-len;
    //处理队尾
    ...
    //处理队首
    ...
}

然后这次做这个题目的时候,先按照别人的方式打的:

int head=0,tail=0;
for(int i=len;i<=n;i++)
{
    int now=i-len+1;//这里加了个1
    //处理队首
    ...
    //处理队尾
    ...
}

AC了,不错,现在再把格式改成自己习惯的试试:

int now=i-len+1; ——> int now=i-len;

结果一测——嗯?!咋回事啊小老弟,咋过不了样例了?

看了半天看不出来个名堂,于是去问了mys毛大佬,终于知道是哪里的问题了QAQ(灰常感谢大佬!)——

原来实际产生影响的是【处理队首与队尾的顺序】,如果now=i-len而先处理队首,那么就会出错...

大体原因好像是【区间开闭】什么的...我太弱了弄不明白orz.../所以这算鬼个思考啊/划掉

代码

//斜率优化经典例题 
#include<cstdio>
#include<cstring>
#include<cmath>
#include<iostream>
#include<algorithm>
using namespace std;
typedef long long ll;
const ll MAXN=5e5,INF=(1LL<<60);
ll a[MAXN+5],pre[MAXN+5],q[MAXN+5],dp[MAXN+5];
int n,len;
ll G(int j,int k)//y
{
	return dp[j]-dp[k]-pre[j]+pre[k]+j*a[j+1]-k*a[k+1];
}
ll S(int j,int k)//x
{
	return a[j+1]-a[k+1];
}
void Solve()
{
	memset(q,0,sizeof(q));
	memset(dp,0,sizeof(dp));
	int head=0,tail=0;
	for(int i=len;i<=n;i++)
	{
		int now=i-len;
		//处理队尾 
		if(now>=len)
		{
			while(head<tail&&G(q[tail-1],q[tail])*S(q[tail],now)>=G(q[tail],now)*S(q[tail-1],q[tail]))
				tail--;
			q[++tail]=now;
		}		
		//处理队首 
		while(head<tail&&G(q[head],q[head+1])>=i*S(q[head],q[head+1]))
			head++;
		dp[i]=dp[q[head]]+(pre[i]-pre[q[head]])-a[q[head]+1]*(i-q[head]);
	} 
	printf("%lld\n",dp[n]);
}
int main()
{
	int t;
	scanf("%d",&t);
	while(t--)
	{
		scanf("%d%d",&n,&len);
		for(int i=1;i<=n;i++)
		{
			scanf("%lld",&a[i]);
			pre[i]=pre[i-1]+a[i];
		}
		Solve();
	}
	return 0;
}
/*
设j<k且在k点的决策比j好 
则有:dp[j]-dp[k]-sum[j]+sum[k]+j*a[j+1]-k*a[k+1]>=i*(a[j+1]-a[k+1])
令G(j,k)=dp[j]-dp[k]-sum[j]+sum[k]+j*a[j+1]-k*a[k+1],S(k,j)=a[j+1]-a[k+1](<0)
则G(j,k)>=i*S(j,k)即G(j,k)/S(j,k)<=i(要变号)
令X(j,k)=G(j,k)/S(j,k)  
所以斜率方程:X(j,k)<=i 
即当j,k满足X(j,k)<=i时,k优于j  
*/

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章