SQL Server discarding SPACE during GROUP BY

問題

Looks like SQL Server (tried on 2008 R2) is doing an RTRIM on columns in GROUP BY clause. Did anyone notice this? Am I missing something here?

The two selects are returning the same result set in the query below, which should not be the case I believe.

declare @t table(Name varchar(100), Age int)
insert into @t values ('A', 20)
insert into @t values ('B', 30)
insert into @t values ('C', 40)
insert into @t values ('D', 25)
insert into @t values (' A', 21)
insert into @t values ('A ', 32)
insert into @t values (' A ', 28)

select
    Name,
    count(*) Count
from @t
group by Name

select
    rtrim(Name) RtrimmedName,
    count(*) Count
from @t
group by rtrim(Name)

Please let me know your thoughts...

回答1

It's actually doing the opposite, but the observable effects are the same.

When comparing two strings of unequal length, one of the rules of SQL (the standard, not just SQL Server) is that the shorter string is padded with spaces until it's the same length, and then the comparison is performed.

If you want to avoid being surprised, you'll need to add a non-space character at the end of each string.

In fact, checking the standard text, it appears that there are two options:

4.6 Type conversions and mixing of data types

...

When values of unequal length are compared, if the collating sequence for the comparison has the NO PAD attribute and the shorter value is equal to a prefix of the longer value, then the shorter value is considered less than the longer value. If the collating sequence for the comparison has the PAD SPACE attribute, for the purposes of the comparison, the shorter value is effectively extended to the length of the longer by concatenation of <space>s on the right.

But all of the SQL Server collations I'm aware of are PAD SPACE.

回答2

This is easier to see:

declare @t table (Name varchar(100), Age int)
  insert @t values('A', 20),('B', 30),('C', 40),('D  ', 25)
                 ,(' A', 21),('A ', 32),(' A ', 28),('D    ',10);

  select Name, Replace(Name,' ','-'),
         count(*) Count
    from @t
group by Name

--
NAME  COLUMN_1  COUNT
A     -A        2
A     A-        2
B     B         1
C     C         1
D     D--       2

Notice the space between A and dot. It chose the 1-space version over the 0-space.
Notice also that the D group chooses the one with 2 trailing spaces over the 4.

So, no it's not performing an RTRIM. It's somewhat of a soft bug however, because it's arbitrarily choosing one of the two columns (the one it came across first) as the result of the GROUP BY which could possibly throw you off if spaces mattered.

直接來測試

DECLARE @t TABLE
(
    Name VARCHAR(100),
    Age INT
);
INSERT @t
VALUES
('A', 20),
('B', 30),
('C', 40),
('D  ', 25),
(' A', 21),
('A ', 32),
(' A ', 28),
('D    ', 10);

SELECT Name,
       REPLACE(Name, ' ', '-'),
       COUNT(*) Count
FROM @t
GROUP BY Name;

SELECT Name, REPLACE(Name, ' ', '-') FROM @t

也可以用datalenth來確定長度,

Note: The DATALENGTH() function counts both leading and trailing spaces when calculating the length of the expression.

SELECT Name, REPLACE(Name, ' ', '-'),DATALENGTH(Name) FROM @t
WHERE [Name] IN ('A','D')

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

SQL Server discarding SPACE during GROUP BY

SQL Server discarding SPACE during GROUP BY

EXCEL中下拉菜單中添加新選項或者刪除選項

號稱能打敗MLP的KAN到底行不行？數學核心原理全面解析

同事使用 insert into select 遷移數據，開開心心上線，上線後被公司開除！

Git使用經驗總結5-修改提交信息

Python 爬蟲：Spring Boot 反爬蟲的成功案例

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

Git使用經驗總結4-撤回上一次本地提交

Java中止線程的方式

壓榨數據庫的真實處理速度

國內SaaS遇冷？未來企業服務賽道是否還有機會？

背單詞首字母 2024年05月

統計原理平均數

list all possible combination of group separator and decimal separator by iterate all cultures

How to fix UTF encoding for whitespaces?

Runaway Regular Expressions: Catastrophic Backtracking

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結