本文轉自 : http://blog.csdn.net/yming0221/article/details/7492423
上一篇博文中我們從宏觀上分析了Linux內核中網絡棧的初始化過程,這裏我們再從宏觀上分析一下一個數據包在各網絡層的傳遞的過程。
我們知道網絡的OSI模型和TCP/IP模型層次結構如下:
上文中我們看到了網絡棧的層次結構:
我們就從最底層開始追溯一個數據包的傳遞流程。
1、網絡接口層
* 硬件監聽物理介質,進行數據的接收,當接收的數據填滿了緩衝區,硬件就會產生中斷,中斷產生後,系統會轉向中斷服務子程序。
* 在中斷服務子程序中,數據會從硬件的緩衝區複製到內核的空間緩衝區,幷包裝成一個數據結構(sk_buff),然後調用對驅動層的接口函數netif_rx()將數據包發送給鏈路層。該函數的實現在net/inet/dev.c中,(在整個網絡棧實現中dev.c文件的作用重大,它銜接了其下的驅動層和其上的網絡層,可以稱它爲鏈路層模塊的實現)
該函數的實現如下:
- /*
- * Receive a packet from a device driver and queue it for the upper
- * (protocol) levels. It always succeeds. This is the recommended
- * interface to use.
- * 從設備驅動層接受到的數據發送到協議的
- * 上層,該函數實際是一個接口。
- */
- void netif_rx(struct sk_buff *skb)
- {
- static int dropping = 0;
- /*
- * Any received buffers are un-owned and should be discarded
- * when freed. These will be updated later as the frames get
- * owners.
- */
- skb->sk = NULL;
- skb->free = 1;
- if(skb->stamp.tv_sec==0)
- skb->stamp = xtime;
- /*
- * Check that we aren't overdoing things.
- */
- if (!backlog_size)
- dropping = 0;
- else if (backlog_size > 300)
- dropping = 1;
- if (dropping)
- {
- kfree_skb(skb, FREE_READ);
- return;
- }
- /*
- * Add it to the "backlog" queue.
- */
- #ifdef CONFIG_SKB_CHECK
- IS_SKB(skb);
- #endif
- skb_queue_tail(&backlog,skb);//加入隊列backlog
- backlog_size++;
- /*
- * If any packet arrived, mark it for processing after the
- * hardware interrupt returns.
- */
- mark_bh(NET_BH);//下半部分bottom half技術可以減少中斷處理程序的執行時間
- return;
- }
該函數中用到了bootom half技術,該技術的原理是將中斷處理程序人爲的分爲兩部分,上半部分是實時性要求較高的任務,後半部分可以稍後完成,這樣就可以節省中斷程序的處理時間。可整體的提高系統的性能。該技術將會在後續的博文中詳細分析。
我們從上一篇分析中知道,在網絡棧初始化的時候,已經將NET的下半部分執行函數定義成了net_bh(在socket.c文件中1375行左右)
- bh_base[NET_BH].routine= net_bh;//設置NET 下半部分的處理函數爲net_bh
* 函數net_bh的實現在net/inet/dev.c中
- /*
- * When we are called the queue is ready to grab, the interrupts are
- * on and hardware can interrupt and queue to the receive queue a we
- * run with no problems.
- * This is run as a bottom half after an interrupt handler that does
- * mark_bh(NET_BH);
- */
- void net_bh(void *tmp)
- {
- struct sk_buff *skb;
- struct packet_type *ptype;
- struct packet_type *pt_prev;
- unsigned short type;
- /*
- * Atomically check and mark our BUSY state.
- */
- if (set_bit(1, (void*)&in_bh))//標記BUSY狀態
- return;
- /*
- * Can we send anything now? We want to clear the
- * decks for any more sends that get done as we
- * process the input.
- */
- dev_transmit();//調用dev_tinit()函數發送數據
- /*
- * Any data left to process. This may occur because a
- * mark_bh() is done after we empty the queue including
- * that from the device which does a mark_bh() just after
- */
- cli();//防止隊列操作錯誤,需要關中斷和開中斷
- /*
- * While the queue is not empty
- */
- while((skb=skb_dequeue(&backlog))!=NULL)//出隊直到隊列爲空
- {
- /*
- * We have a packet. Therefore the queue has shrunk
- */
- backlog_size--;//隊列元素個數減一
- sti();
- /*
- * Bump the pointer to the next structure.
- * This assumes that the basic 'skb' pointer points to
- * the MAC header, if any (as indicated by its "length"
- * field). Take care now!
- */
- skb->h.raw = skb->data + skb->dev->hard_header_len;
- skb->len -= skb->dev->hard_header_len;
- /*
- * Fetch the packet protocol ID. This is also quite ugly, as
- * it depends on the protocol driver (the interface itself) to
- * know what the type is, or where to get it from. The Ethernet
- * interfaces fetch the ID from the two bytes in the Ethernet MAC
- * header (the h_proto field in struct ethhdr), but other drivers
- * may either use the ethernet ID's or extra ones that do not
- * clash (eg ETH_P_AX25). We could set this before we queue the
- * frame. In fact I may change this when I have time.
- */
- type = skb->dev->type_trans(skb, skb->dev);//取出該數據包所屬的協議類型
- /*
- * We got a packet ID. Now loop over the "known protocols"
- * table (which is actually a linked list, but this will
- * change soon if I get my way- FvK), and forward the packet
- * to anyone who wants it.
- *
- * [FvK didn't get his way but he is right this ought to be
- * hashed so we typically get a single hit. The speed cost
- * here is minimal but no doubt adds up at the 4,000+ pkts/second
- * rate we can hit flat out]
- */
- pt_prev = NULL;
- for (ptype = ptype_base; ptype != NULL; ptype = ptype->next) //遍歷ptype_base所指向的網絡協議隊列
- {
- //判斷協議號是否匹配
- if ((ptype->type == type || ptype->type == htons(ETH_P_ALL)) && (!ptype->dev || ptype->dev==skb->dev))
- {
- /*
- * We already have a match queued. Deliver
- * to it and then remember the new match
- */
- if(pt_prev)
- {
- struct sk_buff *skb2;
- skb2=skb_clone(skb, GFP_ATOMIC);//複製數據包結構
- /*
- * Kick the protocol handler. This should be fast
- * and efficient code.
- */
- if(skb2)
- pt_prev->func(skb2, skb->dev, pt_prev);//調用相應協議的處理函數,
- //這裏和網絡協議的種類有關係
- //如IP 協議的處理函數就是ip_rcv
- }
- /* Remember the current last to do */
- pt_prev=ptype;
- }
- } /* End of protocol list loop */
- /*
- * Is there a last item to send to ?
- */
- if(pt_prev)
- pt_prev->func(skb, skb->dev, pt_prev);
- /*
- * Has an unknown packet has been received ?
- */
- else
- kfree_skb(skb, FREE_WRITE);
- /*
- * Again, see if we can transmit anything now.
- * [Ought to take this out judging by tests it slows
- * us down not speeds us up]
- */
- dev_transmit();
- cli();
- } /* End of queue loop */
- /*
- * We have emptied the queue
- */
- in_bh = 0;//BUSY狀態還原
- sti();
- /*
- * One last output flush.
- */
- dev_transmit();//清空緩衝區
- }
* 就以IP數據包爲例來說明,那麼從鏈路層向網絡層傳遞時將調用ip_rcv函數。該函數完成本層的處理後會根據IP首部中使用的傳輸層協議來調用相應協議的處理函數。
UDP對應udp_rcv、TCP對應tcp_rcv、ICMP對應icmp_rcv、IGMP對應igmp_rcv(雖然這裏的ICMP,IGMP一般成爲網絡層協議,但是實際上他們都封裝在IP協議裏面,作爲傳輸層對待)
這個函數比較複雜,後續會詳細分析。這裏粘貼一下,讓我們對整體瞭解更清楚
- /*
- * This function receives all incoming IP datagrams.
- */
- int ip_rcv(struct sk_buff *skb, struct device *dev, struct packet_type *pt)
- {
- struct iphdr *iph = skb->h.iph;
- struct sock *raw_sk=NULL;
- unsigned char hash;
- unsigned char flag = 0;
- unsigned char opts_p = 0; /* Set iff the packet has options. */
- struct inet_protocol *ipprot;
- static struct options opt; /* since we don't use these yet, and they
- take up stack space. */
- int brd=IS_MYADDR;
- int is_frag=0;
- #ifdef CONFIG_IP_FIREWALL
- int err;
- #endif
- ip_statistics.IpInReceives++;
- /*
- * Tag the ip header of this packet so we can find it
- */
- skb->ip_hdr = iph;
- /*
- * Is the datagram acceptable?
- *
- * 1. Length at least the size of an ip header
- * 2. Version of 4
- * 3. Checksums correctly. [Speed optimisation for later, skip loopback checksums]
- * (4. We ought to check for IP multicast addresses and undefined types.. does this matter ?)
- */
- if (skb->len<sizeof(struct iphdr) || iph->ihl<5 || iph->version != 4 ||
- skb->len<ntohs(iph->tot_len) || ip_fast_csum((unsigned char *)iph, iph->ihl) !=0)
- {
- ip_statistics.IpInHdrErrors++;
- kfree_skb(skb, FREE_WRITE);
- return(0);
- }
- /*
- * See if the firewall wants to dispose of the packet.
- */
- #ifdef CONFIG_IP_FIREWALL
- if ((err=ip_fw_chk(iph,dev,ip_fw_blk_chain,ip_fw_blk_policy, 0))!=1)
- {
- if(err==-1)
- icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0, dev);
- kfree_skb(skb, FREE_WRITE);
- return 0;
- }
- #endif
- /*
- * Our transport medium may have padded the buffer out. Now we know it
- * is IP we can trim to the true length of the frame.
- */
- skb->len=ntohs(iph->tot_len);
- /*
- * Next analyse the packet for options. Studies show under one packet in
- * a thousand have options....
- */
- if (iph->ihl != 5)
- { /* Fast path for the typical optionless IP packet. */
- memset((char *) &opt, 0, sizeof(opt));
- if (do_options(iph, &opt) != 0)
- return 0;
- opts_p = 1;
- }
- /*
- * Remember if the frame is fragmented.
- */
- if(iph->frag_off)
- {
- if (iph->frag_off & 0x0020)
- is_frag|=1;
- /*
- * Last fragment ?
- */
- if (ntohs(iph->frag_off) & 0x1fff)
- is_frag|=2;
- }
- /*
- * Do any IP forwarding required. chk_addr() is expensive -- avoid it someday.
- *
- * This is inefficient. While finding out if it is for us we could also compute
- * the routing table entry. This is where the great unified cache theory comes
- * in as and when someone implements it
- *
- * For most hosts over 99% of packets match the first conditional
- * and don't go via ip_chk_addr. Note: brd is set to IS_MYADDR at
- * function entry.
- */
- if ( iph->daddr != skb->dev->pa_addr && (brd = ip_chk_addr(iph->daddr)) == 0)
- {
- /*
- * Don't forward multicast or broadcast frames.
- */
- if(skb->pkt_type!=PACKET_HOST || brd==IS_BROADCAST)
- {
- kfree_skb(skb,FREE_WRITE);
- return 0;
- }
- /*
- * The packet is for another target. Forward the frame
- */
- #ifdef CONFIG_IP_FORWARD
- ip_forward(skb, dev, is_frag);
- #else
- /* printk("Machine %lx tried to use us as a forwarder to %lx but we have forwarding disabled!\n",
- iph->saddr,iph->daddr);*/
- ip_statistics.IpInAddrErrors++;
- #endif
- /*
- * The forwarder is inefficient and copies the packet. We
- * free the original now.
- */
- kfree_skb(skb, FREE_WRITE);
- return(0);
- }
- #ifdef CONFIG_IP_MULTICAST
- if(brd==IS_MULTICAST && iph->daddr!=IGMP_ALL_HOSTS && !(dev->flags&IFF_LOOPBACK))
- {
- /*
- * Check it is for one of our groups
- */
- struct ip_mc_list *ip_mc=dev->ip_mc_list;
- do
- {
- if(ip_mc==NULL)
- {
- kfree_skb(skb, FREE_WRITE);
- return 0;
- }
- if(ip_mc->multiaddr==iph->daddr)
- break;
- ip_mc=ip_mc->next;
- }
- while(1);
- }
- #endif
- /*
- * Account for the packet
- */
- #ifdef CONFIG_IP_ACCT
- ip_acct_cnt(iph,dev, ip_acct_chain);
- #endif
- /*
- * Reassemble IP fragments.
- */
- if(is_frag)
- {
- /* Defragment. Obtain the complete packet if there is one */
- skb=ip_defrag(iph,skb,dev);
- if(skb==NULL)
- return 0;
- skb->dev = dev;
- iph=skb->h.iph;
- }
- /*
- * Point into the IP datagram, just past the header.
- */
- skb->ip_hdr = iph;
- skb->h.raw += iph->ihl*4;
- /*
- * Deliver to raw sockets. This is fun as to avoid copies we want to make no surplus copies.
- */
- hash = iph->protocol & (SOCK_ARRAY_SIZE-1);
- /* If there maybe a raw socket we must check - if not we don't care less */
- if((raw_sk=raw_prot.sock_array[hash])!=NULL)
- {
- struct sock *sknext=NULL;
- struct sk_buff *skb1;
- raw_sk=get_sock_raw(raw_sk, hash, iph->saddr, iph->daddr);
- if(raw_sk) /* Any raw sockets */
- {
- do
- {
- /* Find the next */
- sknext=get_sock_raw(raw_sk->next, hash, iph->saddr, iph->daddr);
- if(sknext)
- skb1=skb_clone(skb, GFP_ATOMIC);
- else
- break; /* One pending raw socket left */
- if(skb1)
- raw_rcv(raw_sk, skb1, dev, iph->saddr,iph->daddr);
- raw_sk=sknext;
- }
- while(raw_sk!=NULL);
- /* Here either raw_sk is the last raw socket, or NULL if none */
- /* We deliver to the last raw socket AFTER the protocol checks as it avoids a surplus copy */
- }
- }
- /*
- * skb->h.raw now points at the protocol beyond the IP header.
- */
- hash = iph->protocol & (MAX_INET_PROTOS -1);
- for (ipprot = (struct inet_protocol *)inet_protos[hash];ipprot != NULL;ipprot=(struct inet_protocol *)ipprot->next)
- {
- struct sk_buff *skb2;
- if (ipprot->protocol != iph->protocol)
- continue;
- /*
- * See if we need to make a copy of it. This will
- * only be set if more than one protocol wants it.
- * and then not for the last one. If there is a pending
- * raw delivery wait for that
- */
- if (ipprot->copy || raw_sk)
- {
- skb2 = skb_clone(skb, GFP_ATOMIC);
- if(skb2==NULL)
- continue;
- }
- else
- {
- skb2 = skb;
- }
- flag = 1;
- /*
- * Pass on the datagram to each protocol that wants it,
- * based on the datagram protocol. We should really
- * check the protocol handler's return values here...
- */
- ipprot->handler(skb2, dev, opts_p ? &opt : 0, iph->daddr,
- (ntohs(iph->tot_len) - (iph->ihl * 4)),
- iph->saddr, 0, ipprot);
- }
- /*
- * All protocols checked.
- * If this packet was a broadcast, we may *not* reply to it, since that
- * causes (proven, grin) ARP storms and a leakage of memory (i.e. all
- * ICMP reply messages get queued up for transmission...)
- */
- if(raw_sk!=NULL) /* Shift to last raw user */
- raw_rcv(raw_sk, skb, dev, iph->saddr, iph->daddr);
- else if (!flag) /* Free and report errors */
- {
- if (brd != IS_BROADCAST && brd!=IS_MULTICAST)
- icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PROT_UNREACH, 0, dev);
- kfree_skb(skb, FREE_WRITE);
- }
- return(0);
- }
3、傳輸層
如果在IP數據報的首部標明的是使用TCP傳輸數據,則在上述函數中會調用tcp_rcv函數。該函數的大體處理流程爲:
“所有使用TCP 協議的套接字對應sock 結構都被掛入tcp_prot 全局變量表示的proto 結構之sock_array 數組中,採用以本地端口號爲索引的插入方式,所以當tcp_rcv 函數接收到一個數據包,在完成必要的檢查和處理後,其將以TCP 協議首部中目的端口號(對於一個接收的數據包而言,其目的端口號就是本地所使用的端口號)爲索引,在tcp_prot 對應sock 結構之sock_array 數組中得到正確的sock 結構隊列,在輔之以其他條件遍歷該隊列進行對應sock 結構的查詢,在得到匹配的sock
結構後,將數據包掛入該sock 結構中的緩存隊列中(由sock 結構中receive_queue 字段指向),從而完成數據包的最終接收。”
該函數的實現也會比較複雜,這是由TCP協議的複雜功能決定的。附代碼如下:
- /*
- * A TCP packet has arrived.
- */
- int tcp_rcv(struct sk_buff *skb, struct device *dev, struct options *opt,
- unsigned long daddr, unsigned short len,
- unsigned long saddr, int redo, struct inet_protocol * protocol)
- {
- struct tcphdr *th;
- struct sock *sk;
- int syn_ok=0;
- if (!skb)
- {
- printk("IMPOSSIBLE 1\n");
- return(0);
- }
- if (!dev)
- {
- printk("IMPOSSIBLE 2\n");
- return(0);
- }
- tcp_statistics.TcpInSegs++;
- if(skb->pkt_type!=PACKET_HOST)
- {
- kfree_skb(skb,FREE_READ);
- return(0);
- }
- th = skb->h.th;
- /*
- * Find the socket.
- */
- sk = get_sock(&tcp_prot, th->dest, saddr, th->source, daddr);
- /*
- * If this socket has got a reset it's to all intents and purposes
- * really dead. Count closed sockets as dead.
- *
- * Note: BSD appears to have a bug here. A 'closed' TCP in BSD
- * simply drops data. This seems incorrect as a 'closed' TCP doesn't
- * exist so should cause resets as if the port was unreachable.
- */
- if (sk!=NULL && (sk->zapped || sk->state==TCP_CLOSE))
- sk=NULL;
- if (!redo)
- {
- if (tcp_check(th, len, saddr, daddr ))
- {
- skb->sk = NULL;
- kfree_skb(skb,FREE_READ);
- /*
- * We don't release the socket because it was
- * never marked in use.
- */
- return(0);
- }
- th->seq = ntohl(th->seq);
- /* See if we know about the socket. */
- if (sk == NULL)
- {
- /*
- * No such TCB. If th->rst is 0 send a reset (checked in tcp_reset)
- */
- tcp_reset(daddr, saddr, th, &tcp_prot, opt,dev,skb->ip_hdr->tos,255);
- skb->sk = NULL;
- /*
- * Discard frame
- */
- kfree_skb(skb, FREE_READ);
- return(0);
- }
- skb->len = len;
- skb->acked = 0;
- skb->used = 0;
- skb->free = 0;
- skb->saddr = daddr;
- skb->daddr = saddr;
- /* We may need to add it to the backlog here. */
- cli();
- if (sk->inuse)
- {
- skb_queue_tail(&sk->back_log, skb);
- sti();
- return(0);
- }
- sk->inuse = 1;
- sti();
- }
- else
- {
- if (sk==NULL)
- {
- tcp_reset(daddr, saddr, th, &tcp_prot, opt,dev,skb->ip_hdr->tos,255);
- skb->sk = NULL;
- kfree_skb(skb, FREE_READ);
- return(0);
- }
- }
- if (!sk->prot)
- {
- printk("IMPOSSIBLE 3\n");
- return(0);
- }
- /*
- * Charge the memory to the socket.
- */
- if (sk->rmem_alloc + skb->mem_len >= sk->rcvbuf)
- {
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return(0);
- }
- skb->sk=sk;
- sk->rmem_alloc += skb->mem_len;
- /*
- * This basically follows the flow suggested by RFC793, with the corrections in RFC1122. We
- * don't implement precedence and we process URG incorrectly (deliberately so) for BSD bug
- * compatibility. We also set up variables more thoroughly [Karn notes in the
- * KA9Q code the RFC793 incoming segment rules don't initialise the variables for all paths].
- */
- if(sk->state!=TCP_ESTABLISHED) /* Skip this lot for normal flow */
- {
- /*
- * Now deal with unusual cases.
- */
- if(sk->state==TCP_LISTEN)
- {
- if(th->ack) /* These use the socket TOS.. might want to be the received TOS */
- tcp_reset(daddr,saddr,th,sk->prot,opt,dev,sk->ip_tos, sk->ip_ttl);
- /*
- * We don't care for RST, and non SYN are absorbed (old segments)
- * Broadcast/multicast SYN isn't allowed. Note - bug if you change the
- * netmask on a running connection it can go broadcast. Even Sun's have
- * this problem so I'm ignoring it
- */
- if(th->rst || !th->syn || th->ack || ip_chk_addr(daddr)!=IS_MYADDR)
- {
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return 0;
- }
- /*
- * Guess we need to make a new socket up
- */
- tcp_conn_request(sk, skb, daddr, saddr, opt, dev, tcp_init_seq());
- /*
- * Now we have several options: In theory there is nothing else
- * in the frame. KA9Q has an option to send data with the syn,
- * BSD accepts data with the syn up to the [to be] advertised window
- * and Solaris 2.1 gives you a protocol error. For now we just ignore
- * it, that fits the spec precisely and avoids incompatibilities. It
- * would be nice in future to drop through and process the data.
- */
- release_sock(sk);
- return 0;
- }
- /* retransmitted SYN? */
- if (sk->state == TCP_SYN_RECV && th->syn && th->seq+1 == sk->acked_seq)
- {
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return 0;
- }
- /*
- * SYN sent means we have to look for a suitable ack and either reset
- * for bad matches or go to connected
- */
- if(sk->state==TCP_SYN_SENT)
- {
- /* Crossed SYN or previous junk segment */
- if(th->ack)
- {
- /* We got an ack, but it's not a good ack */
- if(!tcp_ack(sk,th,saddr,len))
- {
- /* Reset the ack - its an ack from a
- different connection [ th->rst is checked in tcp_reset()] */
- tcp_statistics.TcpAttemptFails++;
- tcp_reset(daddr, saddr, th,
- sk->prot, opt,dev,sk->ip_tos,sk->ip_ttl);
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return(0);
- }
- if(th->rst)
- return tcp_std_reset(sk,skb);
- if(!th->syn)
- {
- /* A valid ack from a different connection
- start. Shouldn't happen but cover it */
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return 0;
- }
- /*
- * Ok.. it's good. Set up sequence numbers and
- * move to established.
- */
- syn_ok=1; /* Don't reset this connection for the syn */
- sk->acked_seq=th->seq+1;
- sk->fin_seq=th->seq;
- tcp_send_ack(sk->sent_seq,sk->acked_seq,sk,th,sk->daddr);
- tcp_set_state(sk, TCP_ESTABLISHED);
- tcp_options(sk,th);
- sk->dummy_th.dest=th->source;
- sk->copied_seq = sk->acked_seq;
- if(!sk->dead)
- {
- sk->state_change(sk);
- sock_wake_async(sk->socket, 0);
- }
- if(sk->max_window==0)
- {
- sk->max_window = 32;
- sk->mss = min(sk->max_window, sk->mtu);
- }
- }
- else
- {
- /* See if SYN's cross. Drop if boring */
- if(th->syn && !th->rst)
- {
- /* Crossed SYN's are fine - but talking to
- yourself is right out... */
- if(sk->saddr==saddr && sk->daddr==daddr &&
- sk->dummy_th.source==th->source &&
- sk->dummy_th.dest==th->dest)
- {
- tcp_statistics.TcpAttemptFails++;
- return tcp_std_reset(sk,skb);
- }
- tcp_set_state(sk,TCP_SYN_RECV);
- /*
- * FIXME:
- * Must send SYN|ACK here
- */
- }
- /* Discard junk segment */
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return 0;
- }
- /*
- * SYN_RECV with data maybe.. drop through
- */
- goto rfc_step6;
- }
- /*
- * BSD has a funny hack with TIME_WAIT and fast reuse of a port. There is
- * a more complex suggestion for fixing these reuse issues in RFC1644
- * but not yet ready for general use. Also see RFC1379.
- */
- #define BSD_TIME_WAIT
- #ifdef BSD_TIME_WAIT
- if (sk->state == TCP_TIME_WAIT && th->syn && sk->dead &&
- after(th->seq, sk->acked_seq) && !th->rst)
- {
- long seq=sk->write_seq;
- if(sk->debug)
- printk("Doing a BSD time wait\n");
- tcp_statistics.TcpEstabResets++;
- sk->rmem_alloc -= skb->mem_len;
- skb->sk = NULL;
- sk->err=ECONNRESET;
- tcp_set_state(sk, TCP_CLOSE);
- sk->shutdown = SHUTDOWN_MASK;
- release_sock(sk);
- sk=get_sock(&tcp_prot, th->dest, saddr, th->source, daddr);
- if (sk && sk->state==TCP_LISTEN)
- {
- sk->inuse=1;
- skb->sk = sk;
- sk->rmem_alloc += skb->mem_len;
- tcp_conn_request(sk, skb, daddr, saddr,opt, dev,seq+128000);
- release_sock(sk);
- return 0;
- }
- kfree_skb(skb, FREE_READ);
- return 0;
- }
- #endif
- }
- /*
- * We are now in normal data flow (see the step list in the RFC)
- * Note most of these are inline now. I'll inline the lot when
- * I have time to test it hard and look at what gcc outputs
- */
- if(!tcp_sequence(sk,th,len,opt,saddr,dev))
- {
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return 0;
- }
- if(th->rst)
- return tcp_std_reset(sk,skb);
- /*
- * !syn_ok is effectively the state test in RFC793.
- */
- if(th->syn && !syn_ok)
- {
- tcp_reset(daddr,saddr,th, &tcp_prot, opt, dev, skb->ip_hdr->tos, 255);
- return tcp_std_reset(sk,skb);
- }
- /*
- * Process the ACK
- */
- if(th->ack && !tcp_ack(sk,th,saddr,len))
- {
- /*
- * Our three way handshake failed.
- */
- if(sk->state==TCP_SYN_RECV)
- {
- tcp_reset(daddr, saddr, th,sk->prot, opt, dev,sk->ip_tos,sk->ip_ttl);
- }
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return 0;
- }
- rfc_step6: /* I'll clean this up later */
- /*
- * Process urgent data
- */
- if(tcp_urg(sk, th, saddr, len))
- {
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return 0;
- }
- /*
- * Process the encapsulated data
- */
- if(tcp_data(skb,sk, saddr, len))
- {
- kfree_skb(skb, FREE_READ);
- release_sock(sk);
- return 0;
- }
- /*
- * And done
- */
- release_sock(sk);
- return 0;
- }
4、應用層
當用戶需要接收數據時,首先根據文件描述符inode得到socket結構和sock結構,然後從sock結構中指向的隊列recieve_queue中讀取數據包,將數據包COPY到用戶空間緩衝區。數據就完整的從硬件中傳輸到用戶空間。這樣也完成了一次完整的從下到上的傳輸。