XSS腳本攻擊防禦（Antisamy）（下）

上篇寫了怎麼使用antisamy防禦xss腳本攻擊，下篇簡單分析一下antisamy過濾的原理。

- 基礎步驟的分析
- 項目源碼
  - 掃描流程
  - html解析成dom的流程

基礎步驟的分析

從上篇的最基礎的實現邏輯分析具體的實現，下面這段最基礎實現非常好理解，看註釋就能知道實現步驟。

//定義過濾的策略
Policy policy = Policy.getInstance("file"); //此處的file指使用的策略文件，如antisamy-ebay.xml
//對html輸入進行過濾
Antisamy antisamy = new Antisamy();
CleanResult cr = antisamy.scan(innerHtml,policy);
//輸出過濾後安全的html或異常信息
String cleanHtml = cr.getCleanHTML();
String errMsg = cr.getErrorMessages();

項目源碼

重點是解析過程。在網上查閱了一些資料，配合閱讀了項目的源碼。確認策略是將html解析成dom樹，然後掃描，源碼中的代碼如下

掃描流程

  public CleanResults scan(String html)
    throws ScanException
  {
    if (html == null) {
      throw new ScanException(new NullPointerException("Null input"));
    }

    this.errorMessages.clear();
    int maxInputSize = this.policy.getMaxInputSize();

    if (maxInputSize < html.length()) {
      addError("error.size.toolarge", new Object[] { Integer.valueOf(html.length()), Integer.valueOf(maxInputSize) });
      throw new ScanException((String)this.errorMessages.get(0));
    }

    this.isNofollowAnchors = this.policy.isNofollowAnchors();
    this.isValidateParamAsEmbed = this.policy.isValidateParamAsEmbed();

    long startOfScan = System.currentTimeMillis();
    try
    {
      CachedItem cachedItem = (CachedItem)cachedItems.poll();
      if (cachedItem == null) {
        cachedItem = new CachedItem();
      }

      html = stripNonValidXMLCharacters(html, cachedItem.invalidXmlCharMatcher);

      DOMFragmentParser parser = cachedItem.getDomFragmentParser();
      try
      {
        parser.parse(new InputSource(new StringReader(html)), this.dom);
      } catch (Exception e) {
        throw new ScanException(e);
      }

      processChildren(this.dom, 0);

      String trimmedHtml = html;

      StringWriter out = new StringWriter();

      OutputFormat format = getOutputFormat();

      HTMLSerializer serializer = getHTMLSerializer(out, format);
      serializer.serialize(this.dom);

      String trimmed = trim(trimmedHtml, out.getBuffer().toString());

      Callable cleanHtml = new Callable(trimmed) {
        public String call() throws Exception {
          return this.val$trimmed;
        }
      };
      this.results = new CleanResults(startOfScan, cleanHtml, this.dom, this.errorMessages);

      cachedItems.add(cachedItem);
      return this.results;
    }
    catch (SAXException e)
    {
      throw new ScanException(e);
    }
    catch (IOException e) {
      throw new ScanException(e);
    }
  }

這段代碼邏輯上比較清楚，簡單的梳理一下：

確定輸入的html不爲空，爲空的話拋出錯誤
確定輸入的html長度不超過最大值，超出的話拋出錯誤
將html中的低價打印字符過濾掉（防止解析失敗）
將html解析成dom樹
處理dom樹返回乾淨的html

html解析成dom的流程

代碼中間解析成dom樹的代碼，查閱了相關資料，思路是這樣的：

4 遍歷dom樹進行以下處理：

4.1 判斷深度是否達到250，是返回異常

4.2 判斷節點是否爲comment，特殊處理後返回。

4.2.1 判斷是否顯示comment，否則刪除comment節點

4.2.2 如果需要顯示節點，則過濾節點的數據
4.3 判斷節點是否爲空元素，根據設置判斷是否需要刪除節點。

4.4 判斷節點是否爲CDATA，過濾節點數據

4.4.1 輸出異常，同時創建text節點代替CDATA節點。
4.5 判斷是否爲ProcessingInstruction，是刪除該節點

4.6 獲取當前節點的tagRule，根據tagRule規則進行不同處理

4.6.1 判斷是否使用默認的tagRule

4.6.2 要麼encode節點（節點內容進行編碼）

4.6.3 要麼filter節點（刪除節點，保留節點子節點）

4.6.4 要麼validate節點
 對style節點進行特殊處理

 對其他節點的屬性一個一個驗證，如果發現校驗失敗，根據配置，進行以下處理：

4.6.4.1 removeTag ：刪除當前元素

4.6.4.2 filterTag : 過濾當前元素

4.6.4.3 encodeTag ： 編碼元素

4.6.4.4 remoteAttr： 默認配置，刪除屬性
4.6.5 要麼truncate節點（刪除節點屬性，以及刪除子節點中非text類型的節點）

4.6.6 要麼刪除節點（包括子節點都進行刪除）

如果僅僅爲了使用項目過濾文本，可以不用過分關注解析細節，瞭解其流程即可。

Vi_error

發佈了34 篇原創文章 · 獲贊 17 · 訪問量 7萬+

私信關注

XSS腳本攻擊防禦（Antisamy）（下）

基礎步驟的分析

項目源碼

掃描流程

html解析成dom的流程

高效率使用windows

智能決策新時代：可視化大屏是否能夠超越傳統白板？

解密Prompt系列28. LLM Agent之金融領域摸索：FinMem & FinAgent

分享幾個.NET開源的AI和LLM相關項目框架

開發中項目的版本管理和svn使用（上）

XSS腳本攻擊防禦（Antisamy）（上）

Linux下用戶組、文件權限複習

Java基礎-String的截取使用

C3P0異常：java.lang.Exception: DEBUG -- CLOSE BY CLIENT STACK TRACE 解決

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結