【斜率優化】POJ-3709——K-Anonymous Sequence

前言

之前做了一道“基礎的斜率優化板題”(https://blog.csdn.net/qq_36294918/article/details/103641411),

然後信心滿滿地看着道題時——

啥玩意兒?這式子加加減減的,哪兒來的斜率啊?/摸不着頭腦... ...

後來參考了別人的博客才發現——我根本就沒有完全理解【斜率優化】好伐...汗...

題目

Description

The explosively increasing network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itself, along with its basic form the degree of nodes, can reveal the identities of individuals.

To address this issue, we study a specific graph-anonymization problem. We call a graph k-anonymous if for every node v, there exist at least k-1 other nodes in the graph with the same degree as v. And we are interested in achieving k-anonymous on a graph with the minimum number of graph-modification operations.

We simplify the problem. Pick n nodes out of the entire graph G and list their degrees in ascending order. We define a sequence k-anonymous if for every element s, there exist at least k-1 other elements in the sequence equal to s. To let the given sequence k-anonymous, you could do one operation only—decrease some of the numbers in the sequence. And we define the cost of the modification the sum of the difference of all numbers you modified. e.g. sequence 2, 2, 3, 4, 4, 5, 5, with k=3, can be modified to 2, 2, 2, 4, 4, 4, 4, which satisfy 3-anonymous property and the cost of the modification will be |3-2| + |5-4| + |5-4| = 3.

Give a sequence with n numbers in ascending order and k, we want to know the modification with minimal cost among all modifications which adjust the sequence k-anonymous.

Input

The first line of the input file contains a single integer T (1 ≤ T ≤ 20) – the number of tests in the input file. Each test starts with a line containing two numbers n (2 ≤ n ≤ 500000) – the amount of numbers in the sequence and k (2 ≤ kn). It is followed by a line with n integer numbers—the degree sequence in ascending order. And every number s in the sequence is in the range [0, 500000].

Output

For each test, output one line containing a single integer—the minimal cost.

Sample Input

2
7 3
2 2 3 4 4 5 5
6 2
0 3 3 4 8 9

Sample Output

3
5

Source

POJ Founder Monthly Contest – 2008.12.28, Rainer

題目大意

將一個升序的,有N個元素的序列分組,要求每組的元素不少於K個

計算出組內各元素與最小元素的之差的和,將每組的這個值加起來,其和要最小

分析

【分析】參考&鳴謝:http://blog.sina.com.cn/s/blog_5f5353cc0100jxxo.html

【代碼】參考&鳴謝:https://www.cnblogs.com/wmj6/p/10800045.html

(大佬已經解釋得很詳細了,我就不贅述了,直接貼解析(~ = ̄ω ̄=)~)


將題目轉化下:將一個升序的,有N個元素的序列,分組。要求每組的元素不少於K個,計算出組內各元素與最小元素的之差的和,將每組的這個值加起來,其和要最小。

很容易可以得出一個結論:連續取比離散取得到的結果要好(很容易證,所以不證)。

由以上可得DP方程:

dp[i]=MIN(dp[j]+sum[i]-sum[j]-(i-j)*arr[j+1]); j<i-k+1


開始斜率優化(不考慮每組不少於K個元素):

1.證明較優決策點對後續狀態影響的持續性

  證明很簡單,不證了,有興趣的話,參考上一篇文章

2.求斜率方程:一般化爲左邊是JK,右邊是I的形式

  假設J<K,且在K點的決策比J好,則有:

 dp[j]+sum[i]-sum[j]-(i-j)*arr[j+1]>= dp[k]+sum[i]-sum[k]-(i-k)*arr[k+1]

化簡得:

dp[j]-dp[k]-sum[j]+sum[k]+j*arr[j+1]-k*arr[k+1]>=i* (arr[j+1]-arr[k+1])

G(k,j)= dp[j]-dp[k]-sum[j]+sum[k]+j*arr[j+1]-k*arr[k+1]

S(k,j)= arr[j+1]-arr[k+1]

則上式化爲

G(k,j)>=i*S(k,j)

即G(k,j)/S(k,j)<=i 記住變號,因爲S(k,j)<0

X(k,j)= G(k,j)/S(k,j)

所以斜率方程:

X(k,j)<=i

3.規定隊列的維護規則

隊首維護:

  假設A,B(A<B)是隊首元素,X(B,A)<=i,BA,刪除A,否則不需維護.

隊尾維護:

    假設A,B,C(A<B<C)是隊尾元素

a.X(B,A)<=i,X(C,B)<=i,CB,BA

b.X(B,A)<=i,X(C,B)>i,BC,BA,B爲極大值

c.X(B,A)>i,AB

 

a,c情況直接刪掉B,b情況保留.b情況可改爲X(B,A)<X(C,B)

 

好,以下考慮每組不少於K個元素這個限制。

要解決這個限制,只需延遲加入的時機即可。

若延遲K-1個回合加入,有可能使前一組的個數少於K個。

若延遲2*k-1個回合加入,則不會出現這情況。但此時加入的數應是i-k+1(假設是第I回合)

 

特別注意在計算的過程中出現溢出的情況(也就是要開long long),偶就因爲沒注意到這個,WA了3次。。。


【由"一個在寫完代碼後發生的小問題"引發的思考】

之前自己寫斜率優化的題目時,都習慣性地打:

int head=0,tail=0;
for(int i=len;i<=n;i++)
{
    int now=i-len;
    //處理隊尾
    ...
    //處理隊首
    ...
}

然後這次做這個題目的時候,先按照別人的方式打的:

int head=0,tail=0;
for(int i=len;i<=n;i++)
{
    int now=i-len+1;//這裏加了個1
    //處理隊首
    ...
    //處理隊尾
    ...
}

AC了,不錯,現在再把格式改成自己習慣的試試:

int now=i-len+1; ——> int now=i-len;

結果一測——嗯?!咋回事啊小老弟,咋過不了樣例了?

看了半天看不出來個名堂,於是去問了mys毛大佬,終於知道是哪裏的問題了QAQ(灰常感謝大佬!)——

原來實際產生影響的是【處理隊首與隊尾的順序】,如果now=i-len而先處理隊首,那麼就會出錯...

大體原因好像是【區間開閉】什麼的...我太弱了弄不明白orz.../所以這算鬼個思考啊/劃掉

代碼

//斜率優化經典例題 
#include<cstdio>
#include<cstring>
#include<cmath>
#include<iostream>
#include<algorithm>
using namespace std;
typedef long long ll;
const ll MAXN=5e5,INF=(1LL<<60);
ll a[MAXN+5],pre[MAXN+5],q[MAXN+5],dp[MAXN+5];
int n,len;
ll G(int j,int k)//y
{
	return dp[j]-dp[k]-pre[j]+pre[k]+j*a[j+1]-k*a[k+1];
}
ll S(int j,int k)//x
{
	return a[j+1]-a[k+1];
}
void Solve()
{
	memset(q,0,sizeof(q));
	memset(dp,0,sizeof(dp));
	int head=0,tail=0;
	for(int i=len;i<=n;i++)
	{
		int now=i-len;
		//處理隊尾 
		if(now>=len)
		{
			while(head<tail&&G(q[tail-1],q[tail])*S(q[tail],now)>=G(q[tail],now)*S(q[tail-1],q[tail]))
				tail--;
			q[++tail]=now;
		}		
		//處理隊首 
		while(head<tail&&G(q[head],q[head+1])>=i*S(q[head],q[head+1]))
			head++;
		dp[i]=dp[q[head]]+(pre[i]-pre[q[head]])-a[q[head]+1]*(i-q[head]);
	} 
	printf("%lld\n",dp[n]);
}
int main()
{
	int t;
	scanf("%d",&t);
	while(t--)
	{
		scanf("%d%d",&n,&len);
		for(int i=1;i<=n;i++)
		{
			scanf("%lld",&a[i]);
			pre[i]=pre[i-1]+a[i];
		}
		Solve();
	}
	return 0;
}
/*
設j<k且在k點的決策比j好 
則有:dp[j]-dp[k]-sum[j]+sum[k]+j*a[j+1]-k*a[k+1]>=i*(a[j+1]-a[k+1])
令G(j,k)=dp[j]-dp[k]-sum[j]+sum[k]+j*a[j+1]-k*a[k+1],S(k,j)=a[j+1]-a[k+1](<0)
則G(j,k)>=i*S(j,k)即G(j,k)/S(j,k)<=i(要變號)
令X(j,k)=G(j,k)/S(j,k)  
所以斜率方程:X(j,k)<=i 
即當j,k滿足X(j,k)<=i時,k優於j  
*/

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章