關於linux的一點好奇心（四）：tail -f文件跟蹤實現

　　關於文件跟蹤，我們有很多的實際場景，比如查看某個系統日誌的輸出，當有變化時立即體現，以便進行問題排查；比如查看文件結尾的內容是啥，總之是剛需了。

1. 自己實現的文件跟蹤

　　我們平時做功能開發時，也會遇到類似的需求，比如當有人傳輸文件到某個位置後，我們需要觸發後續處理操作。

　　那麼，我們自己實現的話，也就只能通過定時檢查文件是否變化，比如檢測最後修改時間，從而感知到變化。如果要想讓文件傳輸完成之後，再進行動作，則一般需要用戶上傳一個空的done文件，以報備事務處理完成。

　　那麼，如果是系統實現呢？如題，tail -f 的文件跟蹤，是否也是這樣實現呢？想想感覺應該不會這麼簡單，畢竟操作系統肯定會比自己厲害此的。

2. tail -f的源碼位置

　　我們知道，每個linux系統安裝之後，都會有很多的基礎命令可用，比如cat/vi/sh/top/tail... 那麼，是否這些命令就是內核提供的東西呢？實際上不是的，linux kernel 部分，並未提供相應的實現，即這些工具類的都不是在kernel中實現的，而是作爲外部核心工具包組件實現。即 coreutils 。這也是我們想分析一些工具類實現時需要注意的，因爲它可能在你找不到的地方。

　　源碼訪問路徑: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c

3. tail -f 實現

　　tail -f 作爲核心工具，雖與操作系統一起出現，但畢竟是獨立實現，所以還是需要考慮具體的操作系統環境，所以它的實現往往需要分情況進行處理。

　　即它有多種實現，一種是和我們一樣，定時去檢測文件化，然後輸出到控制檯；第二種則高級些，利用操作系統提供的文件通知功能，進行實時內容輸出。具體如下：

int
main (int argc, char **argv)
{
  enum header_mode header_mode = multiple_files;
  bool ok = true;
  /* If from_start, the number of items to skip before printing; otherwise,
     the number of items at the end of the file to print.  Although the type
     is signed, the value is never negative.  */
  uintmax_t n_units = DEFAULT_N_LINES;
  size_t n_files;
  char **file;
  struct File_spec *F;
  size_t i;
  bool obsolete_option;

  /* The number of seconds to sleep between iterations.
     During one iteration, every file name or descriptor is checked to
     see if it has changed.  */
  double sleep_interval = 1.0;

  initialize_main (&argc, &argv);
  set_program_name (argv[0]);
  setlocale (LC_ALL, "");
  bindtextdomain (PACKAGE, LOCALEDIR);
  textdomain (PACKAGE);

  atexit (close_stdout);

  have_read_stdin = false;

  count_lines = true;
  forever = from_start = print_headers = false;
  line_end = '\n';
  obsolete_option = parse_obsolete_option (argc, argv, &n_units);
  argc -= obsolete_option;
  argv += obsolete_option;
  parse_options (argc, argv, &n_units, &header_mode, &sleep_interval);

  /* To start printing with item N_UNITS from the start of the file, skip
     N_UNITS - 1 items.  'tail -n +0' is actually meaningless, but for Unix
     compatibility it's treated the same as 'tail -n +1'.  */
  if (from_start)
    {
      if (n_units)
        --n_units;
    }

  if (optind < argc)
    {
      n_files = argc - optind;
      file = argv + optind;
    }
  else
    {
      static char *dummy_stdin = (char *) "-";
      n_files = 1;
      file = &dummy_stdin;
    }

  {
    bool found_hyphen = false;

    for (i = 0; i < n_files; i++)
      if (STREQ (file[i], "-"))
        found_hyphen = true;

    /* When following by name, there must be a name.  */
    if (found_hyphen && follow_mode == Follow_name)
      die (EXIT_FAILURE, 0, _("cannot follow %s by name"), quoteaf ("-"));

    /* When following forever, and not using simple blocking, warn if
       any file is '-' as the stats() used to check for input are ineffective.
       This is only a warning, since tail's output (before a failing seek,
       and that from any non-stdin files) might still be useful.  */
    if (forever && found_hyphen)
      {
        struct stat in_stat;
        bool blocking_stdin;
        blocking_stdin = (pid == 0 && follow_mode == Follow_descriptor
                          && n_files == 1 && ! fstat (STDIN_FILENO, &in_stat)
                          && ! S_ISREG (in_stat.st_mode));

        if (! blocking_stdin && isatty (STDIN_FILENO))
          error (0, 0, _("warning: following standard input"
                         " indefinitely is ineffective"));
      }
  }

  /* Don't read anything if we'll never output anything.  */
  if (! n_units && ! forever && ! from_start)
    return EXIT_SUCCESS;

  F = xnmalloc (n_files, sizeof *F);
  for (i = 0; i < n_files; i++)
    F[i].name = file[i];

  if (header_mode == always
      || (header_mode == multiple_files && n_files > 1))
    print_headers = true;

  xset_binary_mode (STDOUT_FILENO, O_BINARY);

  for (i = 0; i < n_files; i++)
    ok &= tail_file (&F[i], n_units);

  if (forever && ignore_fifo_and_pipe (F, n_files))
    {
      /* If stdout is a fifo or pipe, then monitor it
         so that we exit if the reader goes away.  */
      struct stat out_stat;
      if (fstat (STDOUT_FILENO, &out_stat) < 0)
        die (EXIT_FAILURE, errno, _("standard output"));
      monitor_output = (S_ISFIFO (out_stat.st_mode)
                        || (HAVE_FIFO_PIPES != 1 && isapipe (STDOUT_FILENO)));

#if HAVE_INOTIFY
      /* tailable_stdin() checks if the user specifies stdin via  "-",
         or implicitly by providing no arguments. If so, we won't use inotify.
         Technically, on systems with a working /dev/stdin, we *could*,
         but would it be worth it?  Verifying that it's a real device
         and hooked up to stdin is not trivial, while reverting to
         non-inotify-based tail_forever is easy and portable.

         any_remote_file() checks if the user has specified any
         files that reside on remote file systems.  inotify is not used
         in this case because it would miss any updates to the file
         that were not initiated from the local system.

         any_non_remote_file() checks if the user has specified any
         files that don't reside on remote file systems.  inotify is not used
         if there are no open files, as we can't determine if those file
         will be on a remote file system.

         any_symlinks() checks if the user has specified any symbolic links.
         inotify is not used in this case because it returns updated _targets_
         which would not match the specified names.  If we tried to always
         use the target names, then we would miss changes to the symlink itself.

         ok is false when one of the files specified could not be opened for
         reading.  In this case and when following by descriptor,
         tail_forever_inotify() cannot be used (in its current implementation).

         FIXME: inotify doesn't give any notification when a new
         (remote) file or directory is mounted on top a watched file.
         When follow_mode == Follow_name we would ideally like to detect that.
         Note if there is a change to the original file then we'll
         recheck it and follow the new file, or ignore it if the
         file has changed to being remote.

         FIXME-maybe: inotify has a watch descriptor per inode, and hence with
         our current hash implementation will only --follow data for one
         of the names when multiple hardlinked files are specified, or
         for one name when a name is specified multiple times.  */
      if (!disable_inotify && (tailable_stdin (F, n_files)
                               || any_remote_file (F, n_files)
                               || ! any_non_remote_file (F, n_files)
                               || any_symlinks (F, n_files)
                               || any_non_regular_fifo (F, n_files)
                               || (!ok && follow_mode == Follow_descriptor)))
        disable_inotify = true;

      if (!disable_inotify)
        {
          int wd = inotify_init ();
          if (0 <= wd)
            {
              /* Flush any output from tail_file, now, since
                 tail_forever_inotify flushes only after writing,
                 not before reading.  */
              if (fflush (stdout) != 0)
                die (EXIT_FAILURE, errno, _("write error"));

              Hash_table *ht;
              tail_forever_inotify (wd, F, n_files, sleep_interval, &ht);
              hash_free (ht);
              close (wd);
              errno = 0;
            }
          error (0, errno, _("inotify cannot be used, reverting to polling"));
        }
#endif
      disable_inotify = true;
      tail_forever (F, n_files, sleep_interval);
    }

  if (have_read_stdin && close (STDIN_FILENO) < 0)
    die (EXIT_FAILURE, errno, "-");
  main_exit (ok ? EXIT_SUCCESS : EXIT_FAILURE);
}

　　簡單地說兩種實現就是: tail_forever 輪詢式跟蹤; tail_forever_inotify 更新時通知; 下面我們就簡單瞅瞅兩個具體實現吧。

3.1. 定時掃描跟蹤實現

　　這裏的定時掃描，和我們自己處理的不太一樣的地方，主要是它會跟蹤多個文件，而如果是我們自己實現，則一般只會跟蹤一個文件即可。

/* Tail N_FILES files forever, or until killed.
   The pertinent information for each file is stored in an entry of F.
   Loop over each of them, doing an fstat to see if they have changed size,
   and an occasional open/fstat to see if any dev/ino pair has changed.
   If none of them have changed size in one iteration, sleep for a
   while and try again.  Continue until the user interrupts us.  */

static void
tail_forever (struct File_spec *f, size_t n_files, double sleep_interval)
{
  /* Use blocking I/O as an optimization, when it's easy.  */
  bool blocking = (pid == 0 && follow_mode == Follow_descriptor
                   && n_files == 1 && f[0].fd != -1 && ! S_ISREG (f[0].mode));
  size_t last;
  bool writer_is_dead = false;

  last = n_files - 1;
  // 一直循環檢測，直到用戶主動終止進程
  while (true)
    {
      size_t i;
      bool any_input = false;

      for (i = 0; i < n_files; i++)
        {
          int fd;
          char const *name;
          mode_t mode;
          struct stat stats;
          uintmax_t bytes_read;

          if (f[i].ignore)
            continue;

          if (f[i].fd < 0)
            {
              recheck (&f[i], blocking);
              continue;
            }

          fd = f[i].fd;
          name = pretty_name (&f[i]);
          mode = f[i].mode;

          if (f[i].blocking != blocking)
            {
              int old_flags = fcntl (fd, F_GETFL);
              int new_flags = old_flags | (blocking ? 0 : O_NONBLOCK);
              if (old_flags < 0
                  || (new_flags != old_flags
                      && fcntl (fd, F_SETFL, new_flags) == -1))
                {
                  /* Don't update f[i].blocking if fcntl fails.  */
                  if (S_ISREG (f[i].mode) && errno == EPERM)
                    {
                      /* This happens when using tail -f on a file with
                         the append-only attribute.  */
                    }
                  else
                    die (EXIT_FAILURE, errno,
                         _("%s: cannot change nonblocking mode"),
                         quotef (name));
                }
              else
                f[i].blocking = blocking;
            }

          if (!f[i].blocking)
            {
              // 使用 fstat 進行文件變更檢測，結果存入 stats 變量中
              if (fstat (fd, &stats) != 0)
                {
                  f[i].fd = -1;
                  f[i].errnum = errno;
                  error (0, errno, "%s", quotef (name));
                  close (fd); /* ignore failure */
                  continue;
                }
              // 通過比較 mtime 判斷文件是否發生變化
              if (f[i].mode == stats.st_mode
                  && (! S_ISREG (stats.st_mode) || f[i].size == stats.st_size)
                  && timespec_cmp (f[i].mtime, get_stat_mtime (&stats)) == 0)
                {
                  if ((max_n_unchanged_stats_between_opens
                       <= f[i].n_unchanged_stats++)
                      && follow_mode == Follow_name)
                    {
                      recheck (&f[i], f[i].blocking);
                      f[i].n_unchanged_stats = 0;
                    }
                  continue;
                }

              /* This file has changed.  Print out what we can, and
                 then keep looping.  */
              // 記錄最後一次變更情況
              f[i].mtime = get_stat_mtime (&stats);
              f[i].mode = stats.st_mode;

              /* reset counter */
              f[i].n_unchanged_stats = 0;

              /* XXX: This is only a heuristic, as the file may have also
                 been truncated and written to if st_size >= size
                 (in which case we ignore new data <= size).  */
              if (S_ISREG (mode) && stats.st_size < f[i].size)
                {
                  error (0, 0, _("%s: file truncated"), quotef (name));
                  /* Assume the file was truncated to 0,
                     and therefore output all "new" data.  */
                  xlseek (fd, 0, SEEK_SET, name);
                  f[i].size = 0;
                }

              if (i != last)
                {
                  if (print_headers)
                    write_header (name);
                  last = i;
                }
            }

          /* Don't read more than st_size on networked file systems
             because it was seen on glusterfs at least, that st_size
             may be smaller than the data read on a _subsequent_ stat call.  */
          uintmax_t bytes_to_read;
          if (f[i].blocking)
            bytes_to_read = COPY_A_BUFFER;
          else if (S_ISREG (mode) && f[i].remote)
            bytes_to_read = stats.st_size - f[i].size;
          else
            bytes_to_read = COPY_TO_EOF;
          // 輸出變更內容到控制檯
          bytes_read = dump_remainder (false, name, fd, bytes_to_read);

          any_input |= (bytes_read != 0);
          f[i].size += bytes_read;
        }

      if (! any_live_files (f, n_files))
        {
          error (0, 0, _("no files remaining"));
          break;
        }

      if ((!any_input || blocking) && fflush (stdout) != 0)
        die (EXIT_FAILURE, errno, _("write error"));

      check_output_alive ();

      /* If nothing was read, sleep and/or check for dead writers.  */
      if (!any_input)
        {
          if (writer_is_dead)
            break;

          /* Once the writer is dead, read the files once more to
             avoid a race condition.  */
          writer_is_dead = (pid != 0
                            && kill (pid, 0) != 0
                            /* Handle the case in which you cannot send a
                               signal to the writer, so kill fails and sets
                               errno to EPERM.  */
                            && errno != EPERM);
          // 等待下一次輪詢
          if (!writer_is_dead && xnanosleep (sleep_interval))
            die (EXIT_FAILURE, errno, _("cannot read realtime clock"));

        }
    }
}

　　我們運行tail -f 命令時，就是控制檯會一直停留在輸出界面，等待跟蹤結果，也就是說這時的tail進程，會一直在前臺運行。這時這個進程交獨佔用戶界面，如果用戶不想跟蹤了，那麼就必須主動終止進程，即ctrl+c 或其他進程終止方式。所以，實現還是比較簡單的，如表面意思，就是不停地檢測文件，輸出內容，如果其中一些文件失效，則跳過即可。

　　檢測主要依賴於函數: fstat (fd, &stats) , 通過比較 mtime 進行文件是否變化判定。大致不出意料。

3.2. 基於異步通知的跟蹤實現

　　上一個實現是基於輪詢的方式實現的，這個實現是基於通知的文件跟蹤。基於輪詢的實現，要求有比較合適的輪詢間隔，太長不容易發現變更，太短則容易導致系統壓力大。而基於通知的實現，則優雅許多，它只會在文件發生了變化進進行一次通知，其他時間幾乎不會佔用系統資源（實際上還是有事件輪詢的資源消耗）。我們來看一下。

　　它會先進行 inotify_init(); 然後再進行 tail_forever_inotify;

/* Attempt to tail N_FILES files forever, or until killed.
   Check modifications using the inotify events system.
   Exit if finished or on fatal error; return to revert to polling.  */
static void
tail_forever_inotify (int wd, struct File_spec *f, size_t n_files,
                      double sleep_interval, Hash_table **wd_to_namep)
{
# if TAIL_TEST_SLEEP
  /* Delay between open() and inotify_add_watch()
     to help trigger different cases.  */
  xnanosleep (1000000);
# endif
  unsigned int max_realloc = 3;

  /* Map an inotify watch descriptor to the name of the file it's watching.  */
  Hash_table *wd_to_name;

  bool found_watchable_file = false;
  bool tailed_but_unwatchable = false;
  bool found_unwatchable_dir = false;
  bool no_inotify_resources = false;
  bool writer_is_dead = false;
  struct File_spec *prev_fspec;
  size_t evlen = 0;
  char *evbuf;
  size_t evbuf_off = 0;
  size_t len = 0;

  wd_to_name = hash_initialize (n_files, NULL, wd_hasher, wd_comparator, NULL);
  if (! wd_to_name)
    xalloc_die ();
  *wd_to_namep = wd_to_name;

  /* The events mask used with inotify on files (not directories).  */
  uint32_t inotify_wd_mask = IN_MODIFY;
  /* TODO: Perhaps monitor these events in Follow_descriptor mode also,
     to tag reported file names with "deleted", "moved" etc.  */
  if (follow_mode == Follow_name)
    inotify_wd_mask |= (IN_ATTRIB | IN_DELETE_SELF | IN_MOVE_SELF);

  /* Add an inotify watch for each watched file.  If -F is specified then watch
     its parent directory too, in this way when they re-appear we can add them
     again to the watch list.  */
  size_t i;
  // 依次設置跟蹤標識到文件的通知列表中
  for (i = 0; i < n_files; i++)
    {
      if (!f[i].ignore)
        {
          size_t fnlen = strlen (f[i].name);
          if (evlen < fnlen)
            evlen = fnlen;

          f[i].wd = -1;

          if (follow_mode == Follow_name)
            {
              size_t dirlen = dir_len (f[i].name);
              char prev = f[i].name[dirlen];
              f[i].basename_start = last_component (f[i].name) - f[i].name;

              f[i].name[dirlen] = '\0';

               /* It's fine to add the same directory more than once.
                  In that case the same watch descriptor is returned.  */
              f[i].parent_wd = inotify_add_watch (wd, dirlen ? f[i].name : ".",
                                                  (IN_CREATE | IN_DELETE
                                                   | IN_MOVED_TO | IN_ATTRIB
                                                   | IN_DELETE_SELF));

              f[i].name[dirlen] = prev;

              if (f[i].parent_wd < 0)
                {
                  if (errno != ENOSPC) /* suppress confusing error.  */
                    error (0, errno, _("cannot watch parent directory of %s"),
                           quoteaf (f[i].name));
                  else
                    error (0, 0, _("inotify resources exhausted"));
                  found_unwatchable_dir = true;
                  /* We revert to polling below.  Note invalid uses
                     of the inotify API will still be diagnosed.  */
                  break;
                }
            }
          // 注意回調到全局
          f[i].wd = inotify_add_watch (wd, f[i].name, inotify_wd_mask);

          if (f[i].wd < 0)
            {
              if (f[i].fd != -1)  /* already tailed.  */
                tailed_but_unwatchable = true;
              if (errno == ENOSPC || errno == ENOMEM)
                {
                  no_inotify_resources = true;
                  error (0, 0, _("inotify resources exhausted"));
                  break;
                }
              else if (errno != f[i].errnum)
                error (0, errno, _("cannot watch %s"), quoteaf (f[i].name));
              continue;
            }

          if (hash_insert (wd_to_name, &(f[i])) == NULL)
            xalloc_die ();
          // 只要有一個文件需要處理，就需要保持進程的跟蹤狀態
          found_watchable_file = true;
        }
    }

  /* Linux kernel 2.6.24 at least has a bug where eventually, ENOSPC is always
     returned by inotify_add_watch.  In any case we should revert to polling
     when there are no inotify resources.  Also a specified directory may not
     be currently present or accessible, so revert to polling.  Also an already
     tailed but unwatchable due rename/unlink race, should also revert.  */
  if (no_inotify_resources || found_unwatchable_dir
      || (follow_mode == Follow_descriptor && tailed_but_unwatchable))
    return;
  if (follow_mode == Follow_descriptor && !found_watchable_file)
    exit (EXIT_FAILURE);

  prev_fspec = &(f[n_files - 1]);

  /* Check files again.  New files or data can be available since last time we
     checked and before they are watched by inotify.  */
  for (i = 0; i < n_files; i++)
    {
      if (! f[i].ignore)
        {
          /* check for new files.  */
          if (follow_mode == Follow_name)
            recheck (&(f[i]), false);
          else if (f[i].fd != -1)
            {
              /* If the file was replaced in the small window since we tailed,
                 then assume the watch is on the wrong item (different to
                 that we've already produced output for), and so revert to
                 polling the original descriptor.  */
              struct stat stats;

              if (stat (f[i].name, &stats) == 0
                  && (f[i].dev != stats.st_dev || f[i].ino != stats.st_ino))
                {
                  error (0, errno, _("%s was replaced"),
                         quoteaf (pretty_name (&(f[i]))));
                  return;
                }
            }

          /* check for new data.  */
          check_fspec (&f[i], &prev_fspec);
        }
    }

  evlen += sizeof (struct inotify_event) + 1;
  evbuf = xmalloc (evlen);

  /* Wait for inotify events and handle them.  Events on directories
     ensure that watched files can be re-added when following by name.
     This loop blocks on the 'safe_read' call until a new event is notified.
     But when --pid=P is specified, tail usually waits via poll.  */
  while (true)
    {
      struct File_spec *fspec;
      struct inotify_event *ev;
      void *void_ev;

      /* When following by name without --retry, and the last file has
         been unlinked or renamed-away, diagnose it and return.  */
      if (follow_mode == Follow_name
          && ! reopen_inaccessible_files
          && hash_get_n_entries (wd_to_name) == 0)
        die (EXIT_FAILURE, 0, _("no files remaining"));

      if (len <= evbuf_off)
        {
          /* Poll for inotify events.  When watching a PID, ensure
             that a read from WD will not block indefinitely.
             If MONITOR_OUTPUT, also poll for a broken output pipe.  */

          int file_change;
          struct pollfd pfd[2];
          do
            {
              /* How many ms to wait for changes.  -1 means wait forever.  */
              int delay = -1;

              if (pid)
                {
                  if (writer_is_dead)
                    exit (EXIT_SUCCESS);

                  writer_is_dead = (kill (pid, 0) != 0 && errno != EPERM);

                  if (writer_is_dead || sleep_interval <= 0)
                    delay = 0;
                  else if (sleep_interval < INT_MAX / 1000 - 1)
                    {
                      /* delay = ceil (sleep_interval * 1000), sans libm.  */
                      double ddelay = sleep_interval * 1000;
                      delay = ddelay;
                      delay += delay < ddelay;
                    }
                }

              pfd[0].fd = wd;
              pfd[0].events = POLLIN;
              pfd[1].fd = STDOUT_FILENO;
              pfd[1].events = pfd[1].revents = 0;
              // 讀取文件變更事件，當然還是會有超時處理，不然發生意外就不好了
              file_change = poll (pfd, monitor_output + 1, delay);
            }
          while (file_change == 0);

          if (file_change < 0)
            die (EXIT_FAILURE, errno,
                 _("error waiting for inotify and output events"));
          if (pfd[1].revents)
            die_pipe ();

          len = safe_read (wd, evbuf, evlen);
          evbuf_off = 0;

          /* For kernels prior to 2.6.21, read returns 0 when the buffer
             is too small.  */
          if ((len == 0 || (len == SAFE_READ_ERROR && errno == EINVAL))
              && max_realloc--)
            {
              len = 0;
              evlen *= 2;
              evbuf = xrealloc (evbuf, evlen);
              continue;
            }

          if (len == 0 || len == SAFE_READ_ERROR)
            die (EXIT_FAILURE, errno, _("error reading inotify event"));
        }

      void_ev = evbuf + evbuf_off;
      ev = void_ev;
      evbuf_off += sizeof (*ev) + ev->len;

      /* If a directory is deleted, IN_DELETE_SELF is emitted
         with ev->name of length 0.
         We need to catch it, otherwise it would wait forever,
         as wd for directory becomes inactive. Revert to polling now.   */
      if ((ev->mask & IN_DELETE_SELF) && ! ev->len)
        {
          for (i = 0; i < n_files; i++)
            {
              if (ev->wd == f[i].parent_wd)
                {
                  error (0, 0,
                      _("directory containing watched file was removed"));
                  return;
                }
            }
        }
      // 遍歷找出變化的文件
      if (ev->len) /* event on ev->name in watched directory.  */
        {
          size_t j;
          for (j = 0; j < n_files; j++)
            {
              /* With N=hundreds of frequently-changing files, this O(N^2)
                 process might be a problem.  FIXME: use a hash table?  */
              if (f[j].parent_wd == ev->wd
                  && STREQ (ev->name, f[j].name + f[j].basename_start))
                break;
            }

          /* It is not a watched file.  */
          if (j == n_files)
            continue;

          fspec = &(f[j]);

          int new_wd = -1;
          bool deleting = !! (ev->mask & IN_DELETE);

          if (! deleting)
            {
              /* Adding the same inode again will look up any existing wd.  */
              new_wd = inotify_add_watch (wd, f[j].name, inotify_wd_mask);
            }

          if (! deleting && new_wd < 0)
            {
              if (errno == ENOSPC || errno == ENOMEM)
                {
                  error (0, 0, _("inotify resources exhausted"));
                  return; /* revert to polling.  */
                }
              else
                {
                  /* Can get ENOENT for a dangling symlink for example.  */
                  error (0, errno, _("cannot watch %s"), quoteaf (f[j].name));
                }
              /* We'll continue below after removing the existing watch.  */
            }

          /* This will be false if only attributes of file change.  */
          bool new_watch;
          new_watch = (! deleting) && (fspec->wd < 0 || new_wd != fspec->wd);

          if (new_watch)
            {
              if (0 <= fspec->wd)
                {
                  inotify_rm_watch (wd, fspec->wd);
                  hash_remove (wd_to_name, fspec);
                }

              fspec->wd = new_wd;

              if (new_wd == -1)
                continue;

              /* If the file was moved then inotify will use the source file wd
                for the destination file.  Make sure the key is not present in
                the table.  */
              struct File_spec *prev = hash_remove (wd_to_name, fspec);
              if (prev && prev != fspec)
                {
                  if (follow_mode == Follow_name)
                    recheck (prev, false);
                  prev->wd = -1;
                  close_fd (prev->fd, pretty_name (prev));
                }

              if (hash_insert (wd_to_name, fspec) == NULL)
                xalloc_die ();
            }

          if (follow_mode == Follow_name)
            recheck (fspec, false);
        }
      else
        {
          struct File_spec key;
          key.wd = ev->wd;
          fspec = hash_lookup (wd_to_name, &key);
        }

      if (! fspec)
        continue;

      if (ev->mask & (IN_ATTRIB | IN_DELETE | IN_DELETE_SELF | IN_MOVE_SELF))
        {
          /* Note for IN_MOVE_SELF (the file we're watching has
             been clobbered via a rename) we leave the watch
             in place since it may still be part of the set
             of watched names.  */
          if (ev->mask & IN_DELETE_SELF)
            {
              inotify_rm_watch (wd, fspec->wd);
              hash_remove (wd_to_name, fspec);
            }

          /* Note we get IN_ATTRIB for unlink() as st_nlink decrements.
             The usual path is a close() done in recheck() triggers
             an IN_DELETE_SELF event as the inode is removed.
             However sometimes open() will succeed as even though
             st_nlink is decremented, the dentry (cache) is not updated.
             Thus we depend on the IN_DELETE event on the directory
             to trigger processing for the removed file.  */

          recheck (fspec, false);

          continue;
        }
      // 輸出變化的內容
      check_fspec (fspec, &prev_fspec);
    }
}

/* Output (new) data for FSPEC->fd.
   PREV_FSPEC records the last File_spec for which we output.  */
static void
check_fspec (struct File_spec *fspec, struct File_spec **prev_fspec)
{
  struct stat stats;
  char const *name;

  if (fspec->fd == -1)
    return;

  name = pretty_name (fspec);

  if (fstat (fspec->fd, &stats) != 0)
    {
      fspec->errnum = errno;
      close_fd (fspec->fd, name);
      fspec->fd = -1;
      return;
    }

  /* XXX: This is only a heuristic, as the file may have also
     been truncated and written to if st_size >= size
     (in which case we ignore new data <= size).
     Though in the inotify case it's more likely we'll get
     separate events for truncate() and write().  */
  if (S_ISREG (fspec->mode) && stats.st_size < fspec->size)
    {
      error (0, 0, _("%s: file truncated"), quotef (name));
      xlseek (fspec->fd, 0, SEEK_SET, name);
      fspec->size = 0;
    }
  else if (S_ISREG (fspec->mode) && stats.st_size == fspec->size
           && timespec_cmp (fspec->mtime, get_stat_mtime (&stats)) == 0)
    return;

  bool want_header = print_headers && (fspec != *prev_fspec);

  uintmax_t bytes_read = dump_remainder (want_header, name, fspec->fd,
                                         COPY_TO_EOF);
  fspec->size += bytes_read;

  if (bytes_read)
    {
      *prev_fspec = fspec;
      if (fflush (stdout) != 0)
        die (EXIT_FAILURE, errno, _("write error"));
    }
}

　　基於通知的方式實現文件跟蹤，明顯是複雜了許多，首先是註冊事件，然後是輪詢事件，然後是事件處理。但是這樣的實現，針對大量的文件跟蹤是很省資源的呢。總之，是一種好的實現方式，算是一勞永逸吧。

　　比如我們的io實現方式就有：阻塞io, select/poll io, 異步io, ... 異步總是實現複雜，但是收益也是比較可觀的一種方法。

至於如何註冊監聽，此處沒有深入研究。它的實現有簡單的也有困難的，簡單的就是添加到一個文件的監聽鏈表隊列中，有寫入時依次回調，複雜的就是io多路複用之類的，三兩句也說不清楚，大家可以看看epoll的實現原理。

關於linux的一點好奇心（四）：tail -f文件跟蹤實現

1. 自己實現的文件跟蹤

2. tail -f的源碼位置

3. tail -f 實現

3.1. 定時掃描跟蹤實現

3.2. 基於異步通知的跟蹤實現

關於遊戲付費的一點想法

我通過CKA和CKS啦！

小測試：HashSet可以插入重複的元素嗎？

springboot事務管理實現原理解析

mongodb 深度分頁優化思路之cursor遊標

pagehelper踩坑記之分頁亂套

sql語法巧用之not取反

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結