Flink用户画像

我们要使用的几个组件为Hadoop 2.6,HBase 1.0.0,MySQL 8,zookeeper 3.4.5,kafka 2.1.0,Flink 1.13,Canal 1.1.5。为了方便,这里都使用伪集群和单机安装。

Hadoop 2.6的简单安装

hadoop-env.sh

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home

core-site.xml

<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://127.0.0.1:9000</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/admin/Downloads/hadoop2</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
</configuration>

在bin目录下执行

./hdfs namenode -format

在sbin目录下执行

./start-dfs.sh

访问地址

http://127.0.0.1:50070/

zookeeper 3.4.5安装

zoo.cfg

dataDir=/Users/admin/Downloads/zookeeper/data

在bin目录下执行

./zkServer.sh start

HBase 1.0.0安装

hbase-env.sh

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home

hbase-site.xml

<configuration>
	<property>
		<name>hbase.rootdir</name>
		<value>hdfs://127.0.0.1:9000/hbase</value>
	</property>
	<property>
		<name>hbase.cluster.distributed</name>
		<value>true</value>
	</property>
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>localhost</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
</configuration>

在bin目录下执行

./start-hbase.sh

访问地址

http://127.0.0.1:60010/

kafka 2.1.0安装

server.properties

log.dirs=/Users/admin/Downloads/kafka-logs

在bin目录下执行

./kafka-server-start.sh ../config/server.properties 
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test

MySQL 8 docker安装

新建文件夹,我这里为mysql-bin,新建文件my.cnf,内容如下

[client]
socket = /var/sock/mysqld/mysqld.sock
[mysql]
socket = /var/sock/mysqld/mysqld.sock
[mysqld]
skip-host-cache
skip-name-resolve
datadir = /var/lib/mysql
user = mysql
port = 3306
bind-address = 0.0.0.0
socket = /var/sock/mysqld/mysqld.sock
pid-file = /var/run/mysqld/mysqld.pid
general_log_file = /var/log/mysql/query.log
slow_query_log_file = /var/log/mysql/slow.log
log-error = /var/log/mysql/error.log
log-bin=mysql-bin
binlog-format=ROW
server-id=1
!includedir /etc/my.cnf.d/
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/docker-default.d/

启动命令

docker run -d --name mysql -e MYSQL_ROOT_PASSWORD=abcd123 -p 3306:3306 -v /Users/admin/Downloads/mysql-bin/my.cnf:/etc/my.cnf docker.io/cytopia/mysql-8.0

新建数据库protrait

新建表

DROP TABLE IF EXISTS `user_info`;
CREATE TABLE `user_info` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `account` varchar(255) DEFAULT NULL,
  `password` varchar(255) DEFAULT NULL,
  `sex` varchar(255) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `phone` varchar(255) DEFAULT NULL,
  `status` int(255) DEFAULT NULL COMMENT '会员状态,0、普通会员,1、白银会员,2、黄金会员',
  `wechat_account` varchar(255) DEFAULT NULL,
  `zhifubao_account` varchar(255) DEFAULT NULL,
  `email` varchar(255) DEFAULT NULL,
  `create_time` datetime DEFAULT NULL,
  `update_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4;

SET FOREIGN_KEY_CHECKS = 1;

Canal 1.1.5安装

canal.properties

canal.zkServers = 127.0.0.1:2181
canal.serverMode = kafka

在conf/example目录下的instance.properties

canal.instance.master.address=127.0.0.1:3306
canal.instance.dbUsername=root
canal.instance.dbPassword=abcd123
canal.instance.defaultDatabaseName=portrait
canal.mq.topic=test

在bin目录下执行

./startup.sh

此时当我们在数据库中插入一条数据的时候

insert into user_info (account,password,sex,age,phone,status,wechat_account,zhifubao_account,email,create_time,update_time) 
values ('abcd','1234','男',24,'13873697762',0,'火名之月','abstart','[email protected]','2021-09-10','2021-10-11')

在kafka的消费端查看为

[2021-11-05 15:13:05,173] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
{
"data":[
{
"id":"8",
"account":"abcd",
"password":"1234",
"sex":"男",
"age":"24",
"phone":"13873697762",
"status":"0",
"wechat_account":"火名之月",
"zhifubao_account":"abstart",
"email":"[email protected]",
"create_time":"2021-09-10 00:00:00",
"update_time":"2021-10-11 00:00:00"
}
],
"database":"portrait",
"es":1636096762000,
"id":11,
"isDdl":false,
"mysqlType":{
"id":"bigint(0)",
"account":"varchar(255)",
"password":"varchar(255)",
"sex":"varchar(255)",
"age":"int(0)",
"phone":"varchar(255)",
"status":"int(255)",
"wechat_account":"varchar(255)",
"zhifubao_account":"varchar(255)",
"email":"varchar(255)",
"create_time":"datetime(0)",
"update_time":"datetime(0)"
},
"old":null,
"pkNames":[
"id"
],
"sql":"",
"sqlType":{
"id":-5,
"account":12,
"password":12,
"sex":12,
"age":4,
"phone":12,
"status":4,
"wechat_account":12,
"zhifubao_account":12,
"email":12,
"create_time":93,
"update_time":93
},
"table":"user_info",
"ts":1636096762605,
"type":"INSERT"
}

Flink流式处理消息

Java版依赖,有关Flink的详细内容请参考Flink技术整理 ,由于这里使用的是1.13.0,而之前使用的是1.7.2,有一些API已经不可用了。

<properties>
   <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
   <flink.version>1.13.0</flink.version>
   <alink.version>1.4.0</alink.version>
   <fastjson.version>1.2.74</fastjson.version>
   <java.version>1.8</java.version>
   <scala.version>2.11.12</scala.version>
   <hadoop.version>2.6.0</hadoop.version>
   <hbase.version>1.0.0</hbase.version>
   <scala.binary.version>2.11</scala.binary.version>
   <maven.compiler.source>${java.version}</maven.compiler.source>
   <maven.compiler.target>${java.version}</maven.compiler.target>
</properties>
<dependencies>
   <!-- Apache Flink dependencies -->
   <!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-java</artifactId>
      <version>${flink.version}</version>
      <scope>provided</scope>
   </dependency>
   <dependency>
      <groupId>com.alibaba.alink</groupId>
      <artifactId>alink_core_flink-1.13_2.11</artifactId>
      <version>${alink.version}</version>
   </dependency>
   <dependency>
      <groupId>ru.yandex.clickhouse</groupId>
      <artifactId>clickhouse-jdbc</artifactId>
      <version>0.1.40</version>
   </dependency>
   <dependency>
      <groupId>com.alibaba</groupId>
      <artifactId>fastjson</artifactId>
      <version>${fastjson.version}</version>
   </dependency>
   <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>15.0</version>
   </dependency>
   <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-compress</artifactId>
      <version>1.21</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-clients_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-scala_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>${hbase.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-table-planner_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>${hadoop.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-kafka_2.11</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>1.1.1</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-elasticsearch6_2.11</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>8.0.11</version>
   </dependency>
   <dependency>
      <groupId>org.projectlombok</groupId>
      <artifactId>lombok</artifactId>
      <version>1.18.16</version>
      <optional>true</optional>
   </dependency>
   <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-log4j12</artifactId>
      <version>1.7.7</version>
      <scope>runtime</scope>
   </dependency>
   <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      <version>1.2.17</version>
      <scope>runtime</scope>
   </dependency>
</dependencies>

我们先使用Flink来读取Kafka消息

public class Test {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("test",
                new SimpleStringSchema(),properties);

        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        data.print();
        env.execute("portrait test");
    }
}

运行结果

16:39:42,070 INFO  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - [Consumer clientId=consumer-21, groupId=portrait] Discovered group coordinator admindembp.lan:9092 (id: 2147483647 rack: null)
15> {"data":[{"id":"8","account":"abcd","password":"1234","sex":"男","age":"24","phone":"13873697762","status":"0","wechat_account":"火名之月","zhifubao_account":"abstart","email":"[email protected]","create_time":"2021-09-10 00:00:00","update_time":"2021-10-11 00:00:00"}],"database":"portrait","es":1636096762000,"id":11,"isDdl":false,"mysqlType":{"id":"bigint(0)","account":"varchar(255)","password":"varchar(255)","sex":"varchar(255)","age":"int(0)","phone":"varchar(255)","status":"int(255)","wechat_account":"varchar(255)","zhifubao_account":"varchar(255)","email":"varchar(255)","create_time":"datetime(0)","update_time":"datetime(0)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"account":12,"password":12,"sex":12,"age":4,"phone":12,"status":4,"wechat_account":12,"zhifubao_account":12,"email":12,"create_time":93,"update_time":93},"table":"user_info","ts":1636096762605,"type":"INSERT"}

现在我们将该数据解析并存储到HBase中

新建一个UserInfo的实体类

@Data
@ToString
public class UserInfo {
    private Long id;
    private String account;
    private String password;
    private String sex;
    private Integer age;
    private String phone;
    private Integer status;
    private String wechatAccount;
    private String zhifubaoAccount;
    private String email;
    private Date createTime;
    private Date updateTime;
}

一个HBase工具类

@Slf4j
public class HbaseUtil {
    private static Admin admin = null;
    private static Connection conn = null;

    static {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir","hdfs://127.0.0.1:9000/hbase");
        conf.set("hbase.zookeeper.quorum","127.0.0.1");
        conf.set("hbase.client.scanner.timeout.period","600000");
        conf.set("hbase.rpc.timeout","600000");
        try {
            conn = ConnectionFactory.createConnection(conf);
            admin = conn.getAdmin();
        }catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void createTable(String tableName,String famliyname) throws IOException {
        HTableDescriptor tab = new HTableDescriptor(tableName);
        HColumnDescriptor colDesc = new HColumnDescriptor(famliyname);
        tab.addFamily(colDesc);
        admin.createTable(tab);
        log.info("over");
    }

    public static void put(String tablename, String rowkey, String famliyname, Map<String,String> datamap) throws IOException {
        Table table = conn.getTable(TableName.valueOf(tablename));
        byte[] rowkeybyte = Bytes.toBytes(rowkey);
        Put put = new Put(rowkeybyte);
        if (datamap != null) {
            Set<Map.Entry<String,String>> set = datamap.entrySet();
            for (Map.Entry<String,String> entry : set) {
                String key = entry.getKey();
                Object value = entry.getValue();
                put.addColumn(Bytes.toBytes(famliyname),Bytes.toBytes(key),
                        Bytes.toBytes(value + ""));
            }
        }
        table.put(put);
        table.close();
        log.info("OK");
    }

    public static String getdata(String tablename,String rowkey,
                                 String famliyname,String colmn) throws IOException {
        Table table = conn.getTable(TableName.valueOf(tablename));
        byte[] rowkeybyte = Bytes.toBytes(rowkey);
        Get get = new Get(rowkeybyte);
        Result result = table.get(get);
        byte[] resultbytes = result.getValue(famliyname.getBytes(),colmn.getBytes());
        if (resultbytes == null) {
            return null;
        }
        return new String(resultbytes);
    }

    public static void putdata(String tablename,String rowkey,
                               String famliyname,String colum,
                               String data) throws IOException {
        Table table = conn.getTable(TableName.valueOf(tablename));
        Put put = new Put(rowkey.getBytes());
        put.addColumn(famliyname.getBytes(),colum.getBytes(),data.getBytes());
        table.put(put);
    }

    public static void main(String[] args) throws IOException {
//        createTable("testinfo","time");
        putdata("testinfo","1","time","info","ty");
//        Map<String,String> datamap = new HashMap<>();
//        datamap.put("info1","ty1");
//        datamap.put("info2","ty2");
//        put("testinfo","2","time",datamap);
        String result = getdata("testinfo","1","time","info");
        log.info(result);
    }
}

在HBase的bin目录下执行

./hbase shell
create "user_info","info"
@Slf4j
public class TranferAnaly {

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("test",
                new SimpleStringSchema(),properties);

        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<String> map = data.map(s -> {
            JSONObject jsonObject = JSONObject.parseObject(s);
            String type = jsonObject.getString("type");
            String table = jsonObject.getString("table");
            String database = jsonObject.getString("database");
            String data1 = jsonObject.getString("data");
            List<UserInfo> list = JSONObject.parseArray(data1,UserInfo.class);
            log.info(list.toString());
            for (UserInfo userInfo : list) {
                String tablename = table;
                String rowkey = userInfo.getId() + "";
                String famliyname = "info";
                Map<String,String> datamap = JSONObject.parseObject(JSONObject.toJSONString(userInfo),Map.class);
                datamap.put("database",database);
                datamap.put("typebefore",HbaseUtil.getdata(tablename,rowkey,famliyname,"typecurrent"));
                datamap.put("typecurrent",type);
                HbaseUtil.put(tablename,rowkey,famliyname,datamap);
            }
            return null;
        });
//        map.print();
        env.execute("portrait test");
    }
}

在HBase中查询,即为

scan 'user_info'
ROW                                                          COLUMN+CELL                                                                                                                                                                      
 12                                                          column=info:account, timestamp=1636105093607, value=abcd                                                                                                                         
 12                                                          column=info:age, timestamp=1636105093607, value=24                                                                                                                               
 12                                                          column=info:createTime, timestamp=1636105093607, value=1631203200000                                                                                                             
 12                                                          column=info:database, timestamp=1636105093607, value=portrait                                                                                                                    
 12                                                          column=info:email, timestamp=1636105093607, [email protected]                                                                                                                  
 12                                                          column=info:id, timestamp=1636105093607, value=12                                                                                                                                
 12                                                          column=info:password, timestamp=1636105093607, value=1234                                                                                                                        
 12                                                          column=info:phone, timestamp=1636105093607, value=13873697762                                                                                                                    
 12                                                          column=info:sex, timestamp=1636105093607, value=\xE7\x94\xB7                                                                                                                     
 12                                                          column=info:status, timestamp=1636105093607, value=0                                                                                                                             
 12                                                          column=info:typebefore, timestamp=1636105093607, value=null                                                                                                                      
 12                                                          column=info:typecurrent, timestamp=1636105093607, value=INSERT                                                                                                                   
 12                                                          column=info:updateTime, timestamp=1636105093607, value=1633881600000                                                                                                             
 12                                                          column=info:wechatAccount, timestamp=1636105093607, value=\xE7\x81\xAB\xE5\x90\x8D\xE4\xB9\x8B\xE6\x9C\x88                                                                       
 12                                                          column=info:zhifubaoAccount, timestamp=1636105093607, value=abstart                                                                                                              
1 row(s) in 0.0110 seconds

现在我们再将数据传递出去

在kafka的bin目录下执行,建立一个新的topic

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic user_info
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic user_info

新增加一个Kafka工具类

@Slf4j
public class KafkaUtil {
    private static Properties getProps() {
        Properties props = new Properties();
        props.put("bootstrap.servers","127.0.0.1:9092");
        props.put("acks","all");
        props.put("retries",2);
        props.put("linger.ms",1000);
        props.put("client.id","producer-syn-1");
        props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
        return props;
    }

    public static void sendData(String topicName,String data) throws ExecutionException, InterruptedException {
        KafkaProducer<String,String> producer = new KafkaProducer<>(getProps());
        ProducerRecord<String,String> record = new ProducerRecord<>(topicName,data);
        Future<RecordMetadata> metadataFuture = producer.send(record);
        RecordMetadata recordMetadata = metadataFuture.get();
        log.info("topic:" + recordMetadata.topic());
        log.info("partition:" + recordMetadata.partition());
        log.info("offset:" + recordMetadata.offset());
    }
}

然后将消息发送出去

@Slf4j
public class TranferAnaly {

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("test",
                new SimpleStringSchema(),properties);

        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<String> map = data.map(s -> {
            JSONObject jsonObject = JSONObject.parseObject(s);
            String type = jsonObject.getString("type");
            String table = jsonObject.getString("table");
            String database = jsonObject.getString("database");
            String data1 = jsonObject.getString("data");
            List<UserInfo> list = JSONObject.parseArray(data1,UserInfo.class);
            List<Map<String,String>> listdata = new ArrayList<>();
            log.info(list.toString());
            for (UserInfo userInfo : list) {
                String tablename = table;
                String rowkey = userInfo.getId() + "";
                String famliyname = "info";
                Map<String,String> datamap = JSONObject.parseObject(JSONObject.toJSONString(userInfo),Map.class);
                datamap.put("database",database);
                datamap.put("typebefore",HbaseUtil.getdata(tablename,rowkey,famliyname,"typecurrent"));
                datamap.put("typecurrent",type);
                datamap.put("tablename",table);
                HbaseUtil.put(tablename,rowkey,famliyname,datamap);
                listdata.add(datamap);
            }
            return JSONObject.toJSONString(listdata);
        });
        map.addSink(new SinkFunction<String>() {
            @Override
            public void invoke(String value, Context context) throws Exception {
                List<Map> data = JSONObject.parseArray(value,Map.class);
                for (Map<String,String> map : data) {
                    String tablename = map.get("tablename");
                    KafkaUtil.sendData(tablename,JSONObject.toJSONString(map));
                }
            }
        });
        env.execute("portrait test");
    }
}

查看Kafka消费端

[2021-11-05 20:11:26,869] INFO [GroupCoordinator 0]: Assignment received from leader for group console-consumer-47692 for generation 1 (kafka.coordinator.group.GroupCoordinator)
{"wechatAccount":"火名之月","sex":"男","zhifubaoAccount":"abstart","updateTime":1633881600000,"password":"1234","database":"portrait","createTime":1631203200000,"phone":"13873697762","typecurrent":"INSERT","id":15,"tablename":"user_info","account":"abcd","age":24,"email":"[email protected]","status":0}

创建用户画像Years标签

创建一个年代标签实体类

@Data
public class Years {
    private Long userid;
    private String yearsFlag;
    private Long numbers = 0L;
    private String groupField;
}

创建一个YearsUntil工具类

public class YearsUntil {
    public static String getYears(Integer age) {
        Calendar calendar = Calendar.getInstance();
        calendar.setTime(new Date());
        calendar.add(Calendar.YEAR,-age);
        Date newDate = calendar.getTime();
        DateFormat dateFormat = new SimpleDateFormat("yyyy");
        String newDateString = dateFormat.format(newDate);
        Integer newDateInteger = Integer.parseInt(newDateString);
        String yearBaseType = "未知";
        if (newDateInteger >= 1940 && newDateInteger < 1950) {
            yearBaseType = "40后";
        }else if (newDateInteger >= 1950 && newDateInteger < 1960) {
            yearBaseType = "50后";
        }else if (newDateInteger >= 1960 && newDateInteger < 1970) {
            yearBaseType = "60后";
        }else if (newDateInteger >= 1970 && newDateInteger < 1980) {
            yearBaseType = "70后";
        }else if (newDateInteger >= 1980 && newDateInteger < 1990) {
            yearBaseType = "80后";
        }else if (newDateInteger >= 1990 && newDateInteger < 2000) {
            yearBaseType = "90后";
        }else if (newDateInteger >= 2000 && newDateInteger < 2010) {
            yearBaseType = "00后";
        }else if (newDateInteger >= 2010 && newDateInteger < 2020) {
            yearBaseType = "10后";
        }
        return yearBaseType;
    }
}

创建一个ClickUntil接口

public interface ClickUntil {
    void saveData(String tablename,Map<String,String> data,Set<String> fields);
    ResultSet getQueryResult(String database, String sql) throws Exception;
}

一个实现类

public class DefaultClickUntil implements ClickUntil {
    private static ClickUntil instance = new DefaultClickUntil();

    public static ClickUntil createInstance() {
        return instance;
    }

    private DefaultClickUntil() {

    }

    @Override
    public void saveData(String tablename, Map<String, String> data, Set<String> fields) {

    }

    @Override
    public ResultSet getQueryResult(String database, String sql) throws Exception {
        return null;
    }
}

此处我们不实现接口方法,后续会有其他实现类来代替。

一个ClickUntilFactory工厂类

public class ClickUntilFactory {
    public static ClickUntil createClickUntil() {
        return DefaultClickUntil.createInstance();
    }
}

一个DateUntil工具类

public class DateUntil {
    public static String getByInterMinute(String timeInfo) {
        Long timeMillons = Long.parseLong(timeInfo);
        Date date = new Date(timeMillons);
        DateFormat dateFormatMinute = new SimpleDateFormat("mm");
        DateFormat dateFormatHour = new SimpleDateFormat("yyyyMMddHH");
        String minute = dateFormatMinute.format(date);
        String hour = dateFormatHour.format(date);
        Long minuteLong = Long.parseLong(minute);
        String replaceMinute = "";
        if (minuteLong >= 0 && minuteLong < 5) {
            replaceMinute = "05";
        }else if (minuteLong >= 5 && minuteLong < 10) {
            replaceMinute = "10";
        }else if (minuteLong >= 10 && minuteLong < 15) {
            replaceMinute = "15";
        }else if (minuteLong >= 15 && minuteLong < 20) {
            replaceMinute = "20";
        }else if (minuteLong >= 20 && minuteLong < 25) {
            replaceMinute = "25";
        }else if (minuteLong >= 25 && minuteLong < 30) {
            replaceMinute = "30";
        }else if (minuteLong >= 30 && minuteLong < 35) {
            replaceMinute = "35";
        }else if (minuteLong >= 35 && minuteLong < 40) {
            replaceMinute = "40";
        }else if (minuteLong >= 40 && minuteLong < 45) {
            replaceMinute = "45";
        }else if (minuteLong >= 45 && minuteLong < 50) {
            replaceMinute = "50";
        }else if (minuteLong >= 50 && minuteLong < 55) {
            replaceMinute = "55";
        }else if (minuteLong >= 55 && minuteLong < 60) {
            replaceMinute = "60";
        }
        return hour + replaceMinute;
    }

    public static Long getCurrentFiveMinuteInterStart(Long visitTime) throws ParseException {
        String timeString = getByInterMinute(visitTime + "");
        DateFormat dateFormat = new SimpleDateFormat("yyyyMMddHHmm");
        Date date = dateFormat.parse(timeString);
        return date.getTime();
    }
}

一个YearsAnalyMap的实现MapFunction接口的转换类

public class YearsAnalyMap implements MapFunction<String,Years> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    @SuppressWarnings("unchecked")
    public Years map(String s) throws Exception {
        Map<String,String> datamap = JSONObject.parseObject(s,Map.class);
        String typecurrent = datamap.get("typecurrent");
        Years years = new Years();
        if (typecurrent.equals("INSERT")) {
            UserInfo userInfo = JSONObject.parseObject(s,UserInfo.class);
            String yearLabel = YearsUntil.getYears(userInfo.getAge());
            Map<String,String> mapdata = new HashMap<>();
            mapdata.put("userid",userInfo.getId() + "");
            mapdata.put("yearlabel",yearLabel);
            Set<String> fields = new HashSet<>();
            fields.add("userid");
            clickUntil.saveData("user_info",mapdata,fields);
            String fiveMinute = DateUntil.getByInterMinute(System.currentTimeMillis() + "");
            String groupField = "yearlable==" + fiveMinute + "==" + yearLabel;
            Long numbers = 1L;
            years.setGroupField(groupField);
            years.setNumbers(numbers);
        }
        return years;
    }
}

最后是用户画像的年份标签的Flink流处理

public class YearsAnaly {
    public static void main(String[] args) {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("user_info",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        data.map(new YearsAnalyMap());
    }
}

现在我们将年份标签每5分钟进行一次汇总统计数量,并进行存储(Sink)。

新增一个YearsAnalyReduce实现了ReduceFunction接口的统计类

public class YearsAnalyReduce implements ReduceFunction<Years> {
    @Override
    public Years reduce(Years years, Years t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (years != null) {
            numbers1 = years.getNumbers();
            groupField = years.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            Years years1 = new Years();
            years1.setGroupField(groupField);
            years1.setNumbers(numbers1 + numbers2);
            return years1;
        }
        return null;
    }
}

一个YearsAnalySink实现了SinkFunction接口的存储类

public class YearsAnalySink implements SinkFunction<Years> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(Years value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String yearlabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "yearslabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("yearslabel",yearlabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然后是Flink流处理

public class YearsAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("user_info",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<Years> map = data.map(new YearsAnalyMap());
        DataStream<Years> reduce = map.keyBy(Years::getGroupField).timeWindowAll(Time.minutes(5))
                .reduce(new YearsAnalyReduce());
        reduce.addSink(new YearsAnalySink());
        env.execute("portrait years");
    }
}

这么做是为了看看不同的时间段内用户的年代标签会产生什么样的变化。

创建用户画像手机运营商标签

创建一个手机运营商工具类CarrierUntil

public class CarrierUntil {
    /**
     * 中国电信号码格式验证 手机段: 133,153,180,181,189,177,1700,173,199
     **/
    private static final String CHINA_TELECOM_PATTERN = "(^1(33|53|77|73|99|8[019])\\d{8}$)|(^1700\\d{7}$)";

    /**
     * 中国联通号码格式验证 手机段:130,131,132,155,156,185,186,145,176,1709
     **/
    private static final String CHINA_UNICOM_PATTERN = "(^1(3[0-2]|4[5]|5[56]|7[6]|8[56])\\d{8}$)|(^1709\\d{7}$)";

    /**
     * 中国移动号码格式验证
     * 手机段:134,135,136,137,138,139,150,151,152,157,158,159,182,183,184,187,188,147,178,1705
     **/
    private static final String CHINA_MOBILE_PATTERN = "(^1(3[4-9]|4[7]|5[0-27-9]|7[8]|8[2-478])\\d{8}$)|(^1705\\d{7}$)";

    /**
     * 0、未知 1、移动 2、联通 3、电信
     * @param telphone
     * @return
     */
    public static Integer getCarrierByTel(String telphone) {
        boolean b1 = StringUtils.isNotBlank(telphone) && match(CHINA_MOBILE_PATTERN, telphone);
        if (b1) {
            return 1;
        }
        b1 = StringUtils.isNotBlank(telphone) && match(CHINA_UNICOM_PATTERN, telphone);
        if (b1) {
            return 2;
        }
        b1 = StringUtils.isNotBlank(telphone) && match(CHINA_TELECOM_PATTERN, telphone);
        if (b1) {
            return 3;
        }
        return 0;
    }

    private static boolean match(String regex, String tel) {
        return Pattern.matches(regex, tel);
    }
}

一个运营商标签实体类

@Data
public class Carrier {
    private Long userid;
    private String carrierName;
    private Long numbers = 0L;
    private String groupField;
}

一个CarrierAnalyMap实现了MapFunction接口的转换类

public class CarrierAnalyMap implements MapFunction<String,Carrier> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    @SuppressWarnings("unchecked")
    public Carrier map(String s) throws Exception {
        Map<String,String> datamap = JSONObject.parseObject(s,Map.class);
        String typecurrent = datamap.get("typecurrent");
        Carrier carrier = new Carrier();
        if (typecurrent.equals("INSERT")) {
            UserInfo userInfo = JSONObject.parseObject(s,UserInfo.class);
            String telphone = userInfo.getPhone();
            Integer carrierInteger = CarrierUntil.getCarrierByTel(telphone);
            String carrierLabel = "";
            switch (carrierInteger) {
                case 0:
                    carrierLabel = "未知";
                    break;
                case 1:
                    carrierLabel = "移动";
                    break;
                case 2:
                    carrierLabel = "联通";
                    break;
                case 3:
                    carrierLabel = "电信";
                    break;
                default:
                    break;
            }
            Map<String,String> mapdata = new HashMap<>();
            mapdata.put("userid",userInfo.getId() + "");
            mapdata.put("carrierlabel",carrierLabel);
            Set<String> fields = new HashSet<>();
            fields.add("userid");
            clickUntil.saveData("user_info",mapdata,fields);
            String fiveMinute = DateUntil.getByInterMinute(System.currentTimeMillis() + "");
            String groupField = "carrierlabel==" + fiveMinute + "==" + carrierLabel;
            Long numbers = 1L;
            carrier.setGroupField(groupField);
            carrier.setNumbers(numbers);
        }
        return carrier;
    }
}

一个CarrierAnalyReduce实现了ReduceFunction接口的统计类

public class CarrierAnalyReduce implements ReduceFunction<Carrier> {
    @Override
    public Carrier reduce(Carrier carrier, Carrier t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (carrier != null) {
            numbers1 = carrier.getNumbers();
            groupField = carrier.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            Carrier carrier1 = new Carrier();
            carrier1.setGroupField(groupField);
            carrier1.setNumbers(numbers1 + numbers2);
            return carrier1;
        }
        return null;
    }
}

一个CarrierAnalySink实现了SinkFunction接口的存储类

public class CarrierAnalySink implements SinkFunction<Carrier> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(Carrier value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String carrierLabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "carrierlabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("carrierlabel",carrierLabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然后是Flink的流处理

public class CarrierAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("user_info",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<Carrier> map = data.map(new CarrierAnalyMap());
        DataStream<Carrier> reduce = map.keyBy(Carrier::getGroupField).timeWindowAll(Time.minutes(5))
                .reduce(new CarrierAnalyReduce());
        reduce.addSink(new CarrierAnalySink());
        env.execute("portrait carrier");
    }
}

创建用户画像会员分类标签

会员标签实体类

@Data
public class Member {
    private Long userid;
    private String memberFlag;
    private Long numbers = 0L;
    private String groupField;
}

一个MemberAnalyMap实现了MapFunction接口的转换类

public class MemberAnalyMap implements MapFunction<String,Member> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    @SuppressWarnings("unchecked")
    public Member map(String s) throws Exception {
        Map<String,String> datamap = JSONObject.parseObject(s,Map.class);
        String typecurrent = datamap.get("typecurrent");
        Member member = new Member();
        if (typecurrent.equals("INSERT")) {
            UserInfo userInfo = JSONObject.parseObject(s,UserInfo.class);
            Integer memberInteger = userInfo.getStatus();
            String memberLabel = "";
            switch (memberInteger) {
                case 0:
                    memberLabel = "普通会员";
                    break;
                case 1:
                    memberLabel = "白银会员";
                    break;
                case 2:
                    memberLabel = "黄金会员";
                    break;
                default:
                    break;
            }
            Map<String,String> mapdata = new HashMap<>();
            mapdata.put("userid",userInfo.getId() + "");
            mapdata.put("memberlabel",memberLabel);
            Set<String> fields = new HashSet<>();
            fields.add("userid");
            clickUntil.saveData("user_info",mapdata,fields);
            String fiveMinute = DateUntil.getByInterMinute(System.currentTimeMillis() + "");
            String groupField = "memberlable==" + fiveMinute + "==" + memberLabel;
            Long numbers = 1L;
            member.setGroupField(groupField);
            member.setNumbers(numbers);
        }
        return member;
    }
}

一个MemberAnalyReduce实现了ReduceFunction接口的统计类

public class MemberAnalyReduce implements ReduceFunction<Member> {
    @Override
    public Member reduce(Member member, Member t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (member != null) {
            numbers1 = member.getNumbers();
            groupField = member.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            Member member1 = new Member();
            member1.setGroupField(groupField);
            member1.setNumbers(numbers1 + numbers2);
            return member1;
        }
        return null;
    }
}

一个MemberAnalySink实现了SinkFunction接口的存储类

public class MemberAnalySink implements SinkFunction<Member> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(Member value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String memberLabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "memberlabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("memberlabel",memberLabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然后是Flink流处理

public class MemberAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("user_info",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<Member> map = data.map(new MemberAnalyMap());
        DataStream<Member> reduce = map.keyBy(Member::getGroupField).timeWindowAll(Time.minutes(5))
                .reduce(new MemberAnalyReduce());
        reduce.addSink(new MemberAnalySink());
        env.execute("portrait member");
    }
}

用户画像行为特征

这里我们会分析用户的几个行为,并进行画像

  1. 浏览商品行为:频道id、商品id、商品类别id、浏览时间、停留时间、用户id、终端类别(1、PC端,2、微信小程序,3、app)、deviceId。
  2. 收藏商品行为:频道id、商品id、商品类别id、操作时间、操作类型(收藏,取消)、用户id、终端类别(1、PC端,2、微信小程序,3、app)
  3. 购物车行为:频道id、商品id、商品类别id、操作时间、操作类型(加入,取消)、用户id、终端类别(1、PC端,2、微信小程序,3、app)
  4. 关注商品行为:频道id、商品id、商品类别id、操作时间、操作类型(关注,取消)、用户id、终端类别(1、PC端,2、微信小程序,3、app)

定义四种行为的实体类

/**
 * 浏览操作
 */
@Data
public class ScanOpertor {
    /**
     * 频道id
     */
    private Long channelId;
    /**
     * 商品类型id
     */
    private Long productTypeId;
    /**
     * 商品id
     */
    private Long productId;
    /**
     * 浏览时间
     */
    private Long scanTime;
    /**
     * 停留时间
     */
    private Long stayTime;
    /**
     * 用户id
     */
    private Long userId;
    /**
     * 终端类别
     */
    private Integer deviceType;
    /**
     * 终端id
     */
    private String deviceId;
}
/**
 * 收藏操作
 */
@Data
public class CollectOpertor {
    /**
     * 频道id
     */
    private Long channelId;
    /**
     * 商品类型id
     */
    private Long productTypeId;
    /**
     * 商品id
     */
    private Long productId;
    /**
     * 操作时间
     */
    private Long opertorTime;
    /**
     * 操作类型
     */
    private Integer opertorType;
    /**
     * 用户id
     */
    private Long userId;
    /**
     * 终端类别
     */
    private Integer deviceType;
    /**
     * 终端id
     */
    private String deviceId;
}
/**
 * 购物车操作
 */
@Data
public class CartOpertor {
    /**
     * 频道id
     */
    private Long channelId;
    /**
     * 商品类型id
     */
    private Long productTypeId;
    /**
     * 商品id
     */
    private Long productId;
    /**
     * 操作时间
     */
    private Long opertorTime;
    /**
     * 操作类型
     */
    private Integer opertorType;
    /**
     * 用户id
     */
    private Long userId;
    /**
     * 终端类别
     */
    private Integer deviceType;
    /**
     * 终端id
     */
    private String deviceId;
}
/**
 * 关注操作
 */
@Data
public class AttentionOpertor {
    /**
     * 频道id
     */
    private Long channelId;
    /**
     * 商品类型id
     */
    private Long productTypeId;
    /**
     * 商品id
     */
    private Long productId;
    /**
     * 操作时间
     */
    private Long opertorTime;
    /**
     * 操作类型
     */
    private Integer opertorType;
    /**
     * 用户id
     */
    private Long userId;
    /**
     * 终端类别
     */
    private Integer deviceType;
    /**
     * 终端id
     */
    private String deviceId;
}

在Kafka的bin目录下执行

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic scan
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic collection
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic cart
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic attention

新建一个商品表

DROP TABLE IF EXISTS `product`;
CREATE TABLE `product` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `product_type_id` bigint(20) DEFAULT NULL,
  `product_name` varchar(255) DEFAULT NULL,
  `product_title` varchar(255) DEFAULT NULL,
  `product_price` decimal(28,10) DEFAULT NULL,
  `product_desc` varchar(255) DEFAULT NULL,
  `merchant_id` bigint(20) DEFAULT NULL,
  `create_time` datetime DEFAULT NULL,
  `update_time` datetime DEFAULT NULL,
  `product_place` varchar(255) DEFAULT NULL,
  `product_brand` varchar(255) DEFAULT NULL,
  `product_weight` decimal(28,10) DEFAULT NULL,
  `product_specification` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

对应实体类

@Data
public class Product {
    private Long id;
    private Long productTypeId;
    private String productName;
    private String productTitle;
    private BigDecimal productPrice;
    private String productDesc;
    private Long merchantId;
    private Date creteTime;
    private Date updateTime;
    private String productPlace;
    private String productBrand;
    private Double productWeight;
    private String productSpecification;
}

一个商品类型表

DROP TABLE IF EXISTS `product_type`;
CREATE TABLE `product_type` (
  `id` bigint(20) NOT NULL,
  `product_type_name` varchar(255) DEFAULT NULL,
  `product_type_desc` varchar(255) DEFAULT NULL,
  `product_type_parent_id` bigint(20) DEFAULT NULL,
  `product_type_level` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

对应实体类

@Data
public class ProductType {
    private Long id;
    private String productTypeName;
    private String productTypeDesc;
    private Long productTypeParentId;
    private Integer productTypeLevel;
}

一个订单表

DROP TABLE IF EXISTS `order`;
CREATE TABLE `order` (
  `id` bigint(20) NOT NULL,
  `amount` decimal(28,10) DEFAULT NULL,
  `user_id` bigint(20) DEFAULT NULL,
  `product_id` bigint(20) DEFAULT NULL,
  `product_type_id` int(11) DEFAULT NULL,
  `merchant_id` bigint(20) DEFAULT NULL,
  `create_time` datetime DEFAULT NULL,
  `pay_time` datetime DEFAULT NULL,
  `pay_status` int(11) DEFAULT NULL COMMENT '0、未支付,1、已支付,2、已退款',
  `address` varchar(1000) DEFAULT NULL,
  `telphone` varchar(255) DEFAULT NULL,
  `username` varchar(255) DEFAULT NULL,
  `trade_number` varchar(255) DEFAULT NULL,
  `pay_type` int(255) DEFAULT NULL COMMENT '0、支付宝,1、银联,2、微信',
  `number` int(11) DEFAULT NULL,
  `order_status` int(255) DEFAULT NULL COMMENT '0、已提交,1、已支付,2、已取消,3、已删除',
  `update_time` datetime DEFAULT NULL,
  `advister_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

对应实体类

@Data
public class Order {
    private Long id;
    private BigDecimal amount;
    private Long userId;
    private Long productId;
    private Long productTypeId;
    private Long merchantId;
    private Date createTime;
    private Date payTime;
    private Integer payStatus; //支付状态,0未支付,1已支付,2已退款
    private String address;
    private String telphone;
    private String username;
    private String tradeNumber;
    private Integer payType;
    private Integer number;
    private Integer orderStatus;
    private Date updateTime;
    private Long advisterId;//广告id
}

在HBase bin目录下执行

./hbase shell
create "product","info"
create "product_type","info"
create "order","info"

在Kafka的bin目录下执行

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic product
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic product_type
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic order

由于有多个表,所以TranferAnaly修改如下

@Slf4j
public class TranferAnaly {

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("test",
                new SimpleStringSchema(),properties);

        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<JSONObject> dataJson = data.map(s -> JSONObject.parseObject(s))
                .filter(json -> json.getString("type").equals("INSERT"));
        DataStream<String> map = dataJson.map(jsonObject -> {
            String type = jsonObject.getString("type");
            String table = jsonObject.getString("table");
            String database = jsonObject.getString("database");
            String data1 = jsonObject.getString("data");
            JSONArray jsonArray = JSONObject.parseArray(data1);
            List<Map<String, String>> listdata = new ArrayList<>();
            for (int i = 0; i < jsonArray.size(); i++) {
                JSONObject jsonObject1 = jsonArray.getJSONObject(i);
                String tablename = table;
                String rowkey = jsonObject1.getString("id");
                String famliyname = "info";
                Map<String, String> datamap = JSONObject.parseObject(JSONObject.toJSONString(jsonObject1), Map.class);
                datamap.put("database", database);
                String typebefore = HbaseUtil.getdata(tablename, rowkey, famliyname, "typecurrent");
                datamap.put("typebefore", typebefore);
                datamap.put("typecurrent", type);
                datamap.put("tablename", table);
                HbaseUtil.put(tablename, rowkey, famliyname, datamap);
                listdata.add(datamap);
            }
            return JSONObject.toJSONString(listdata);
        });
        map.addSink(new SinkFunction<String>() {
            @Override
            public void invoke(String value, Context context) throws Exception {
                List<Map> data = JSONObject.parseArray(value,Map.class);
                for (Map<String,String> map : data) {
                    String tablename = map.get("tablename");
                    KafkaUtil.sendData(tablename,JSONObject.toJSONString(map));
                }
            }
        });
        env.execute("portrait tranfer");
    }
}

新建一个SpringBoot项目来进行业务数据收集

依赖

<properties>
   <java.version>1.8</java.version>
   <fastjson.version>1.2.74</fastjson.version>
</properties>
<dependencies>
   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
   </dependency>
   <dependency>
      <groupId>com.alibaba</groupId>
      <artifactId>fastjson</artifactId>
      <version>${fastjson.version}</version>
   </dependency>
   <dependency>
      <groupId>org.springframework.kafka</groupId>
      <artifactId>spring-kafka</artifactId>
   </dependency>
   <dependency>
      <groupId>org.projectlombok</groupId>
      <artifactId>lombok</artifactId>
      <optional>true</optional>
   </dependency>
   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-test</artifactId>
      <scope>test</scope>
   </dependency>
</dependencies>

配置文件

spring:
  kafka:
    bootstrap-servers: 127.0.0.1:9092
    producer:
      retries: 0
      batch-size: 16384
      buffer-memory: 33554432
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
      acks: -1
    consumer:
      group-id: portrait
      auto-offset-reset: earliest
      enable-auto-commit: false
      auto-commit-interval: 100
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      max-poll-records: 10
    listener:
      concurrency: 3
      type: batch
      ack-mode: manual

Kafka生产者

@Component
@Slf4j
public class KafkaProducer {
    @Autowired
    private KafkaTemplate<String,String> kafkaTemplate;

    @SuppressWarnings("unchecked")
    public void produce(String topic,String message) {
        try {
            ListenableFuture future = kafkaTemplate.send(topic, message);
            SuccessCallback<SendResult<String,String>> successCallback = new SuccessCallback<SendResult<String, String>>() {
                @Override
                public void onSuccess(@Nullable SendResult<String, String> result) {
                    log.info("发送消息成功");
                }
            };
            FailureCallback failureCallback = new FailureCallback() {
                @Override
                public void onFailure(Throwable ex) {
                    log.error("发送消息失败",ex);
                    produce(topic,message);
                }
            };
            future.addCallback(successCallback,failureCallback);
        } catch (Exception e) {
            log.error("发送消息异常",e);
        }
    }
}

收集控制类

@RestController
public class DataController {
    @Autowired
    private KafkaProducer kafkaProducer;

    @PostMapping("/revicedata")
    public void reviceData(@RequestBody String data) {
        JSONObject jsonObject = JSONObject.parseObject(data);
        String type = jsonObject.getString("type");
        String topic = "";
        switch (type) {
            case "0":
                topic = "scan";
                break;
            case "1":
                topic = "collection";
                break;
            case "2":
                topic = "cart";
                break;
            case "3":
                topic = "attention";
                break;
            default:
                break;
        }
        kafkaProducer.produce(topic,data);
    }
}

其实这里只是一个简单的用户行为模拟,我们应该建立日志微服务来收集所有的用户行为。具体可以参考AOP原理与自实现 ,可以根据这里进行改造,将日志类由

@Builder
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Log implements Serializable {

   private static final long serialVersionUID = -5398795297842978376L;

   private Long id;
   private String username;
   /** 模块 */
   private String module;
   /** 参数值 */
   private String params;
   private String remark;
   private Boolean flag;
   private Date createTime;
   private String ip;
   private String area;
}

替换成上面的各种操作类即可。当然还要做一些其他的修改,这里就不去进行修改了。另外将RabbitMQ改成Kafka即可。

创建用户画像商品类别偏好标签

创建一个商品类型标签实体类

@Data
public class ProductTypeLabel {
    private Long userid;
    private String productTypeId;
    private Long numbers = 0L;
    private String groupField;
}

在DateUntil工具类中增加一个方法,获取当前时间的小时数。

public static Long getCurrentHourStart(Long visitTime) throws ParseException {
    Date date = new Date(visitTime);
    DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd HH");
    Date filterTime = dateFormat.parse(dateFormat.format(date));
    return filterTime.getTime();
}

创建一个ProductTypeAnalyMap实现了MapFunction接口的转换类

public class ProductTypeAnalyMap implements MapFunction<String,ProductTypeLabel> {
    @Override
    public ProductTypeLabel map(String s) throws Exception {
        ScanOpertor scanOpertor = JSONObject.parseObject(s, ScanOpertor.class);
        Long userid = scanOpertor.getUserId();
        Long productTypeId = scanOpertor.getProductTypeId();
        String tablename = "user_info";
        String rowkey = userid + "";
        String famliyname = "info";
        String colum = "producttypelist";
        //获取历史用户偏好商品类型
        String productTypeListString = HbaseUtil.getdata(tablename, rowkey, famliyname, colum);
        List<Map> temp = new ArrayList<>();
        List<Map<String,Long>> result = new ArrayList<>();
        if (StringUtils.isNotBlank(productTypeListString)) {
            temp = JSONObject.parseArray(productTypeListString,Map.class);
        }
        for (Map map : temp) {
            Long productTypeId1 = Long.parseLong(map.get("key").toString());
            Long value = Long.parseLong(map.get("value").toString());
            //如果新的商品类型与历史商品类型有相同的类型,偏好值+1
            if (productTypeId.equals(productTypeId1)) {
                value++;
                map.put("value",value);
            }
            result.add(map);
        }
        Collections.sort(result,(o1,o2) -> {
            Long value1 = o1.get("value");
            Long value2 = o2.get("value");
            return value2.compareTo(value1);
        });
        if (result.size() > 5) {
            result = result.subList(0,5);
        }
        String data = JSONObject.toJSONString(result);
        HbaseUtil.putdata(tablename,rowkey,famliyname,colum,data);
        ProductTypeLabel productTypeLabel = new ProductTypeLabel();
        //格式:productType==timehour==productTypeId
        String groupField = "productType==" + DateUntil.getCurrentHourStart(System.currentTimeMillis())
                + "==" + productTypeId;
        productTypeLabel.setUserid(userid);
        productTypeLabel.setProductTypeId(productTypeId + "");
        productTypeLabel.setNumbers(1L);
        productTypeLabel.setGroupField(groupField);
        return productTypeLabel;
    }
}

一个ProductTypeAnalyReduce实现了ReduceFunction接口的统计类

public class ProductTypeAnalyReduce implements ReduceFunction<ProductTypeLabel> {
    @Override
    public ProductTypeLabel reduce(ProductTypeLabel productTypeLabel, ProductTypeLabel t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (productTypeLabel != null) {
            numbers1 = productTypeLabel.getNumbers();
            groupField = productTypeLabel.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            ProductTypeLabel productTypeLabel1 = new ProductTypeLabel();
            productTypeLabel1.setGroupField(groupField);
            productTypeLabel1.setNumbers(numbers1 + numbers2);
            return productTypeLabel1;
        }
        return null;
    }
}

一个ProductTypeAnalySink实现了SinkFunction接口的存储类

public class ProductTypeAnalySink implements SinkFunction<ProductTypeLabel> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(ProductTypeLabel value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String productTypeLabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "producttypelabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("producttypelabel",productTypeLabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然后是Flink的流处理

public class ProductTypeAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("scan",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<ProductTypeLabel> map = data.map(new ProductTypeAnalyMap());
        DataStream<ProductTypeLabel> reduce = map.keyBy(ProductTypeLabel::getGroupField).timeWindowAll(Time.hours(1))
                .reduce(new ProductTypeAnalyReduce());
        reduce.addSink(new ProductTypeAnalySink());
        env.execute("portrait scan");
    }
}

创建用户画像纠结商品标签

创建一个纠结商品标签实体类

@Data
public class TangleProduct {
    private Long userid;
    private String productId;
    private Long numbers = 0L;
    private String groupField;
}

一个TangleProductAnalyMap实现了MapFunction接口的转换类

public class TangleProductAnalyMap implements MapFunction<String,TangleProduct> {
    @Override
    public TangleProduct map(String s) throws Exception {
        CartOpertor cartOpertor = JSONObject.parseObject(s, CartOpertor.class);
        Long userid = cartOpertor.getUserId();
        Long productId = cartOpertor.getProductId();
        String tablename = "user_info";
        String rowkey = userid + "";
        String famliyname = "info";
        String colum = "tangleproducts";
        //获取历史用户纠结的商品
        String tangleProducts = HbaseUtil.getdata(tablename, rowkey, famliyname, colum);
        List<Map> temp = new ArrayList<>();
        List<Map<String,Long>> result = new ArrayList<>();
        if (StringUtils.isNotBlank(tangleProducts)) {
            temp = JSONObject.parseArray(tangleProducts,Map.class);
        }
        for (Map map : temp) {
            Long productId1 = Long.parseLong(map.get("key").toString());
            Long value = Long.parseLong(map.get("value").toString());
            //如果新的商品类型与历史商品类型有相同的类型,偏好值+1
            if (productId.equals(productId1)) {
                value++;
                map.put("value",value);
            }
            result.add(map);
        }
        Collections.sort(result,(o1, o2) -> {
            Long value1 = o1.get("value");
            Long value2 = o2.get("value");
            return value2.compareTo(value1);
        });
        if (result.size() > 5) {
            result = result.subList(0,5);
        }
        String data = JSONObject.toJSONString(result);
        HbaseUtil.putdata(tablename,rowkey,famliyname,colum,data);
        TangleProduct tangleProduct = new TangleProduct();
        //格式:tangleProduct==timehour==productId
        String groupField = "tangleProduct==" + DateUntil.getCurrentHourStart(System.currentTimeMillis())
                + "==" + productId;
        tangleProduct.setUserid(userid);
        tangleProduct.setProductId(productId + "");
        tangleProduct.setNumbers(1L);
        tangleProduct.setGroupField(groupField);
        return tangleProduct;
    }
}

一个TangleProductAnalyReduct实现了ReduceFunction接口的统计类

public class TangleProductAnalyReduct implements ReduceFunction<TangleProduct> {
    @Override
    public TangleProduct reduce(TangleProduct tangleProduct, TangleProduct t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (tangleProduct != null) {
            numbers1 = tangleProduct.getNumbers();
            groupField = tangleProduct.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            TangleProduct tangleProduct1 = new TangleProduct();
            tangleProduct1.setGroupField(groupField);
            tangleProduct1.setNumbers(numbers1 + numbers2);
            return tangleProduct1;
        }
        return null;
    }
}

一个TangleProductAnalySink实现了SinkFunction接口的存储类

public class TangleProductAnalySink implements SinkFunction<TangleProduct> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(TangleProduct value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String tangleProductLabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "tangleproductlabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("tangleproductlabel",tangleProductLabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然后是Flink的流处理

public class TangleProductAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("cart",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<TangleProduct> map = data.map(new TangleProductAnalyMap());
        DataStream<TangleProduct> reduce = map.keyBy(TangleProduct::getGroupField).timeWindowAll(Time.hours(1))
                .reduce(new TangleProductAnalyReduct());
        reduce.addSink(new TangleProductAnalySink());
        env.execute("portrait cart");
    }
}

对用户性别的预测,建立用户画像的性别标签

要对用户性别进行预测,其实是一个二分类的问题,我们可以使用任何一种分类算法(比如逻辑回归,朴素贝叶斯分类,GBDT,LightGBM等),但首先我们需要建立我们的训练数据和测试数据。由于用户填写的性别信息可能比较随意,所以不是所有的用户性别都是正确的,所以这里需要对用户性别进行预测并对测试数据打上用户画像性别标签。

这里需要几个指标来构建我们的特征数据集,特征如下

用户id
订单次数
订单频次
浏览男装次数
浏览童装次数
浏览老年人服装次数
浏览女装次数
订单平均金额
浏览商品频次

标签数据集即为

label  0、男,1、女

这里某用户的订单次数,订单频次(平均每月次数),订单平均金额都可以直接在数据库中获取统计,而浏览次数可以在HBase中获取。将其拼接后存入文件train.csv中。这里的数据要求性别是准确的,可以通过某些途径来准确获取真实性别。

在hadoop的bin目录下执行

./hdfs dfs -put /Users/admin/Downloads/train.csv /

对于性别判定不准确的时候存入文件test.csv中,在test.csv中没有label项。在hadoop的bin目录下执行

./hdfs dfs -put /Users/admin/Downloads/test.csv /

创建一个性别标签类

@Data
public class Sex {
    private Long userid;//用户id
    private Long ordernums;//订单次数
    private Long orderintenums;//订单频次
    private Long manClothes;//浏览男装
    private Long chidrenClothes;//浏览童装
    private Long oldClothes;//浏览老人装
    private Long womenClothes;//浏览女装
    private Double ordermountavg;//订单平均金额
    private Long productscannums;//浏览商品频次
    private Integer label;//0 男 1 女
    private String groupField;//
    private String sex;
    private Long numbers;
}

一个SexAnalyMap

public class SexAnalyMap implements MapFunction<Tuple10<Long, Long, Long, Long, Long, Long, Long, Double, Long, Integer>,Sex> {

    @Override
    public Sex map(Tuple10<Long, Long, Long, Long, Long, Long, Long, Double, Long, Integer> value) throws Exception {
        Random random = new Random();
        String groupField = "sex==" + random.nextInt(100);
        Sex sex = new Sex();
        sex.setUserid(value.getField(0));
        sex.setOrdernums(value.getField(1));
        sex.setOrderintenums(value.getField(2));
        sex.setManClothes(value.getField(3));
        sex.setChidrenClothes(value.getField(4));
        sex.setOldClothes(value.getField(5));
        sex.setWomenClothes(value.getField(6));
        sex.setOrdermountavg(value.getField(7));
        sex.setProductscannums(value.getField(8));
        sex.setLabel(value.getField(9));
        sex.setGroupField(groupField);
        return sex;
    }
}

DateUntil新增方法

public static Long getCurrentWeekStart(Long visitTime) {
    Calendar cal =Calendar.getInstance();
    if (visitTime != null) {
        cal.setTimeInMillis(visitTime);
    }
    cal.set(Calendar.DAY_OF_WEEK, Calendar.MONDAY);
    cal.set(Calendar.HOUR_OF_DAY, 0);
    cal.set(Calendar.MINUTE, 0);
    cal.set(Calendar.SECOND, 0);
    cal.set(Calendar.MILLISECOND, 0);
    return cal.getTimeInMillis();
}

一个SexSaveMap实现了MapFunction接口的转换类,将预测出的测试数据集的性别标签存储到HBase中。这里可以按星期查看性别的变化差异。

public class SexSaveMap implements MapFunction<Sex,Sex> {
    @Override
    public Sex map(Sex value) throws Exception {
        if (value.getLabel() == 0) {
            value.setSex("男");
        }else if (value.getLabel() == 1) {
            value.setSex("女");
        }
        String tablename = "user_info";
        String rowkey = value.getUserid() + "";
        String famliyname = "info";
        String colum = "sexlabel";
        HbaseUtil.putdata(tablename,rowkey,famliyname,colum,value.getSex());
        Long timeinfo = DateUntil.getCurrentWeekStart(System.currentTimeMillis());
        String groupField = "sexlabel==" + timeinfo + "==" + value.getSex();
        Long numbers = 1L;
        value.setGroupField(groupField);
        value.setNumbers(numbers);
        return value;
    }
}

一个SexAnalyReduct实现了ReduceFunction接口的统计类

public class SexAnalyReduct implements ReduceFunction<Sex> {
    @Override
    public Sex reduce(Sex value1, Sex value2) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (value1 != null) {
            numbers1 = value1.getNumbers();
            groupField = value1.getGroupField();
        }
        Long numbers2 = 0L;
        if (value2 != null) {
            numbers2 = value2.getNumbers();
            groupField = value2.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            Sex sex = new Sex();
            sex.setGroupField(groupField);
            sex.setNumbers(numbers1 + numbers2);
            return sex;
        }
        return null;
    }
}

然后是Flink的批处理,这里需要注意的是批处理是没有Sink接口的。并且使用了Alink的逻辑回归来对测试数据集进行性别预测。

public class SexAnaly {
    private static ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    public static void main(String[] args) throws Exception {
        String filePath = "hdfs://127.0.0.1:9000/train.csv";
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        DataSet<Tuple10<Long, Long, Long, Long, Long, Long, Long, Double, Long, Integer>> fileSourceTrain = env.readCsvFile(filePath).ignoreFirstLine()
                .types(Long.class, Long.class, Long.class, Long.class, Long.class, Long.class,
                        Long.class, Double.class, Long.class, Integer.class);
        DataSet<Sex> mapTrain = fileSourceTrain.map(new SexAnalyMap());
        List<Sex> sexes = mapTrain.collect();
        List<Row> df = sexes.stream().map(sex -> Row.of(sex.getUserid(), sex.getOrdernums(),
                sex.getOrderintenums(), sex.getManClothes(), sex.getChidrenClothes(),
                sex.getOldClothes(), sex.getWomenClothes(), sex.getOrdermountavg(),
                sex.getProductscannums(), sex.getLabel()))
                .collect(Collectors.toList());
        BatchOperator<?> input = new MemSourceBatchOp(df,"f0 long,f1 long,f2 long," +
                "f3 long,f4 long,f5 long,f6 long,f7 double,f8 long,f9 int");
        //对数据进行逻辑回归训练
        BatchOperator<?> lr = new LogisticRegressionTrainBatchOp()
                .setFeatureCols("f0","f1","f2","f3","f4","f5","f6","f7","f8")
                .setLabelCol("f9");
        BatchOperator model = input.link(lr);
        String testFilePath = "hdfs://127.0.0.1:9000/test.csv";
        DataSet<Tuple9<Long, Long, Long, Long, Long, Long, Long, Double, Long>> fileSourceTest = env.readCsvFile(testFilePath).ignoreFirstLine()
                .types(Long.class, Long.class, Long.class, Long.class, Long.class, Long.class,
                Long.class, Double.class, Long.class);
        List<Sex> testSexes = fileSourceTest.map(new SexTestMap()).collect();
        List<Row> testDf = testSexes.stream().map(sex -> Row.of(sex.getUserid(), sex.getOrdernums(),
                sex.getOrderintenums(), sex.getManClothes(), sex.getChidrenClothes(),
                sex.getOldClothes(), sex.getWomenClothes(), sex.getOrdermountavg(),
                sex.getProductscannums()))
                .collect(Collectors.toList());
        BatchOperator<?> testInput = new MemSourceBatchOp(testDf,"f0 long,f1 long,f2 long," +
                "f3 long,f4 long,f5 long,f6 long,f7 double,f8 long");
        BatchOperator dataTest = testInput;
        BatchOperator <?> predictor = new LogisticRegressionPredictBatchOp().setPredictionCol("pred");
        //对测试数据进行预测
        List<Row> predicts = predictor.linkFrom(model, dataTest).collect();
        List<Sex> predictSexes = predicts.stream().map(row -> {
            Sex sex = new Sex();
            sex.setUserid((Long) row.getField(0));
            sex.setOrdernums((Long) row.getField(1));
            sex.setOrderintenums((Long) row.getField(2));
            sex.setManClothes((Long) row.getField(3));
            sex.setChidrenClothes((Long) row.getField(4));
            sex.setOldClothes((Long) row.getField(5));
            sex.setWomenClothes((Long) row.getField(6));
            sex.setOrdermountavg((Double) row.getField(7));
            sex.setProductscannums((Long) row.getField(8));
            sex.setLabel((Integer) row.getField(9));
            return sex;
        }).collect(Collectors.toList());
        DataSet<Sex> predictSource = env.fromCollection(predictSexes);
        DataSet<Sex> mapSave = predictSource.map(new SexSaveMap());
        DataSet<Sex> reduce = mapSave.groupBy(Sex::getGroupField).reduce(new SexAnalyReduct());
        List<Sex> saveList = reduce.collect();
        for (Sex sex : saveList) {
            String groupField = sex.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String sexLabel = groupFields[2];
            Long numbers = sex.getNumbers();
            String tablename = "sexlabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo", timeinfo);
            dataMap.put("sexlabel", sexLabel);
            dataMap.put("numbers", numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

我的测试数据集中有3条数据,打开HBase,查看user_info的sexlabel列簇可以看到

scan 'user_info',{COLUMNS=>'info:sexlabel'}
ROW                            COLUMN+CELL                                                                             
 1                             column=info:sexlabel, timestamp=1636522706964, value=\xE7\x94\xB7                       
 2                             column=info:sexlabel, timestamp=1636522706954, value=\xE5\xA5\xB3                       
 3                             column=info:sexlabel, timestamp=1636522706966, value=\xE7\x94\xB7                       

ClickHouse Docker安装部署

ClickHouse是对标Hadoop圈的Hive,是一个更加强大的数据仓库。

Docker安装ClickHouse

docker pull yandex/clickhouse-client
docker pull yandex/clickhouse-server
docker run -d --name ck-server -p 8123:8123 -p 9001:9000 -p 9009:9009 --ulimit nofile=262144:262144 -v /Users/admin/Downloads/clickhouse_database/:/var/lib/clickhouse yandex/clickhouse-server

进入ClickHouse

docker exec -it ck-server clickhouse-client

创建一个数据库test

create database test ENGINE=Ordinary;
use test;

创建两个表

create table testo2(id UInt16,col1 String,col2 String,create_date date)ENGINE=MergeTree(create_date,(id),8192);
create table test(id UInt16,name String,create_date Date)ENGINE=MergeTree(create_date,(id),8192);

这里的引擎类型MergeTree要求有一个日期字段,还有主键,8192为索引粒度,为默认值。

现在在test表中插入四条数据

insert into test(id,name,create_date) values(1,'小白','2021-10-10');
insert into test(id,name,create_date) values(2,'小黄','2021-10-10');
insert into test(id,name,create_date) values(3,'小花','2021-10-10');
insert into test(id,name,create_date) values(4,'小王','2021-10-10');

查询test表

select * from test;

结果

Query id: 15b1e461-5287-455b-a483-31dd3ed1ae84

┌─id─┬─name─┬─create_date─┐
│  1 │ 小白 │  2021-10-10 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  3 │ 小花 │  2021-10-10 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  2 │ 小黄 │  2021-10-10 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  4 │ 小王 │  2021-10-10 │
└────┴──────┴─────────────┘

4 rows in set. Elapsed: 0.016 sec. 

退出ClickHouse新建文件夹

mkdir ck-config
cd ck-config/
docker cp ck-server:/etc/clickhouse-server/config.xml ./
vim config.xml

查找<listen_host>::</listen_host>,将其两边注释取消,保存config.xml

关闭ck-server,重新启动docker,以便于外网可以访问ClickHouse。

docker stop ck-server
docker rm ck-server
docker run -d --name ck-server --ulimit nofile=262144:262144 -p 8123:8123 -p 9001:9000 -p 9009:9009 -v /Users/admin/Downloads/ck-config/config.xml:/etc/clickhouse-server/config.xml -v /Users/admin/Downloads/clickhouse_database/:/var/lib/clickhouse yandex/clickhouse-server

Java依赖

<dependency>
   <groupId>ru.yandex.clickhouse</groupId>
   <artifactId>clickhouse-jdbc</artifactId>
   <version>0.1.40</version>
</dependency>

写一个测试类连接ClickHouse

public class ClickHouseTest {
    public static void main(String[] args) {
        String sql = "select create_date,count(1) as numbers from test where id != 1 group by create_date";
        exeSql(sql);
    }


    public static void exeSql(String sql){
        String address = "jdbc:clickhouse://127.0.0.1:8123/test";
        Connection connection = null;
        Statement statement = null;
        ResultSet results = null;
        try {
            Class.forName("ru.yandex.clickhouse.ClickHouseDriver");
            connection = DriverManager.getConnection(address);
            statement = connection.createStatement();
            long begin = System.currentTimeMillis();
            results = statement.executeQuery(sql);
            long end = System.currentTimeMillis();
            System.out.println("执行("+sql+")耗时:"+(end-begin)+"ms");
            ResultSetMetaData rsmd = results.getMetaData();
            List<Map> list = new ArrayList();
            while(results.next()){
                Map map = new HashMap();
                for(int i = 1;i<=rsmd.getColumnCount();i++){
                    map.put(rsmd.getColumnName(i),results.getString(rsmd.getColumnName(i)));
                }
                list.add(map);
            }
            for(Map map : list){
                System.err.println(map);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }finally {//关闭连接
            try {
                if(results!=null){
                    results.close();
                }
                if(statement!=null){
                    statement.close();
                }
                if(connection!=null){
                    connection.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }
}

运行结果

{numbers=3, create_date=2021-10-10}
执行(select create_date,count(1) as numbers from test where id != 1 group by create_date)耗时:20ms

之前有一个接口ClickUntil,我们的实现类DefaultClickUntil没有实现接口方法,现在我们用另一个实现类来取代

@Slf4j
public class ClickHouseUntil implements ClickUntil {
    private static ClickUntil instance = new ClickHouseUntil();

    private ClickHouseUntil() {
    }

    public static ClickUntil createInstance() {
        return instance;
    }

    @Override
    public void saveData(String tablename, Map<String, String> data,Set<String> fields) {
        String resultsql = "insert into ";
        resultsql += tablename +" (";
        String valuesql = "(";
        Set<Map.Entry<String,String>> sets =  data.entrySet();
        for(Map.Entry<String,String> map:sets){
            String fieldName = map.getKey();
            String valuestring = map.getValue();
            resultsql += fieldName + ",";
            if(fields.contains(fieldName)){
                valuesql += valuestring + ",";
            }else {
                valuesql += "'"+valuestring + "'" + ",";
            }

        }
        resultsql = resultsql.substring(0,resultsql.length() - 1) + ")";
        valuesql = valuesql.substring(0,valuesql.length() - 1) + ")";
        resultsql = resultsql + " values "+ valuesql;
        log.info(resultsql);
        try {
            Connection connection = getConnection("jdbc:clickhouse://127.0.0.1:8123/test","ru.yandex.clickhouse.ClickHouseDriver");
            Statement statement = connection.createStatement();
            statement.execute(resultsql);//执行sql语句
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private Connection getConnection(String addressParam, String driverClassNameParam) throws Exception {
        String address = addressParam;
        Class.forName(driverClassNameParam);
        Connection connection  = DriverManager.getConnection(address);
        return connection;
    }

    @Override
    public ResultSet getQueryResult(String database, String sql) throws Exception {
        Connection connection = getConnection("jdbc:clickhouse://127.0.0.1:8123/" + database,"ru.yandex.clickhouse.ClickHouseDriver");
        Statement statement = connection.createStatement();
        ResultSet resultSet = statement.executeQuery(sql);
        return resultSet;
    }
}

修改工厂类

public class ClickUntilFactory {
    public static ClickUntil createClickUntil() {
        return ClickHouseUntil.createInstance();
    }
}

运行结果,在clickhouse里查询

select * from test;

SELECT *
FROM test

Query id: 120a64ed-e2b9-4906-9cfd-e04bba081813

┌─id─┬─name─┬─create_date─┐
│  1 │ 小白 │  2021-10-10 │
│  2 │ 小黄 │  2021-10-10 │
│  3 │ 小花 │  2021-10-10 │
│  4 │ 小王 │  2021-10-10 │
└────┴──────┴─────────────┘
┌──id─┬─name────┬─create_date─┐
│ 111 │ xiaobai │  2018-09-07 │
└─────┴─────────┴─────────────┘

5 rows in set. Elapsed: 0.014 sec. 

创建用户画像营销敏感度标签

创建一个广告操作实体类

/**
 * 广告
 */
@Data
public class AdvisterOpertor {
    private Long advisterId; //广告id
    private Long productId; //商品id
    private Long clickTime; //点击时间
    private Long publishTime; //发布时间
    private Long stayTime; //停留时间
    private Long userId; //用户id
    private Integer deviceType; //终端类型(0,PC,1,微信小程序,2,app)
    private String deviceId; //终端id
    private Integer advisterType; //广告类型(0动画,1纯文字,2视频,3文字加动画)
    private Integer isStar; //是否有明星(0没有,1有)
}

一个市场敏感度实体类

/**
 * 市场敏感度
 */
@Data
public class MarketSensitivity {
    private Long userId; //用户id
    private Long advisterId; //广告id
    private Integer advisterType; //广告类型
    private String advisterTypeName; //广告类型名称
    private Integer orderNums; //订单数量
    private Integer adviserNums; //广告点击数
    private String groupField;
    private Long timeInfo;
    private String sensitivityFlag; //营销敏感度标签
    private Long advisterTypeNums; //相同广告类型数量
}

在kafka bin目录下执行

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic adviser

修改一下之前的SpringBoot项目中的控制类来对广告日志进行模拟。

@RestController
public class DataController {
    @Autowired
    private KafkaProducer kafkaProducer;

    @PostMapping("/revicedata")
    public void reviceData(@RequestBody String data) {
        JSONObject jsonObject = JSONObject.parseObject(data);
        String type = jsonObject.getString("type");
        String topic = "";
        switch (type) {
            case "0":
                topic = "scan";
                break;
            case "1":
                topic = "collection";
                break;
            case "2":
                topic = "cart";
                break;
            case "3":
                topic = "attention";
                break;
            default:
                topic = "adviser";
                break;
        }
        kafkaProducer.produce(topic,data);
    }
}

一个MarketSensitivityAnalyMap实现了MapFunction接口的转换类。

public class MarketSensitivityAnalyMap implements MapFunction<JSONObject,MarketSensitivity> {
    @Override
    public MarketSensitivity map(JSONObject value) throws Exception {
        String adviser = value.getString("adviser");
        String orderStr = value.getString("order");
        AdvisterOpertor advisterOpertor = JSONObject.parseObject(adviser,AdvisterOpertor.class);
        Long userId = advisterOpertor.getUserId();
        Long advisterId = advisterOpertor.getAdvisterId();
        Integer advisterType = advisterOpertor.getAdvisterType();
        Order order = JSONObject.parseObject(orderStr,Order.class);
        Integer orderNums = 0;
        if (order != null) {
            orderNums = 1;
        }
        Integer adviserNums = 1;
        Long timeInfo = DateUntil.getCurrentHourStart(System.currentTimeMillis());
        MarketSensitivity marketSensitivity = new MarketSensitivity();
        marketSensitivity.setAdviserNums(adviserNums);
        marketSensitivity.setOrderNums(orderNums);
        marketSensitivity.setUserId(userId);
        marketSensitivity.setAdvisterId(advisterId);
        marketSensitivity.setTimeInfo(timeInfo);
        String fieldGroup = "MarketSensitivity==" + timeInfo + "==" + userId + "==" + advisterId;
        marketSensitivity.setGroupField(fieldGroup);
        marketSensitivity.setAdvisterType(advisterType);
        return marketSensitivity;
    }
}

一个MarketSensitivityAnalyReduce实现了ReduceFunction的统计类

public class MarketSensitivityAnalyReduce implements ReduceFunction<MarketSensitivity> {
    @Override
    public MarketSensitivity reduce(MarketSensitivity value1, MarketSensitivity value2) throws Exception {
        Long userId = value1.getUserId();
        String groupField = value1.getGroupField();
        Long advisterId = value1.getAdvisterId();
        Long timeInfo = value1.getTimeInfo();
        Integer advisterType = value1.getAdvisterType();
        Integer advisterNums1 = value1.getAdviserNums();
        Integer orderNums1 = value1.getOrderNums();

        Integer advisterNums2 = value2.getAdviserNums();
        Integer orderNums2 = value2.getOrderNums();

        MarketSensitivity marketSensitivity = new MarketSensitivity();
        marketSensitivity.setUserId(userId);
        marketSensitivity.setGroupField(groupField);
        marketSensitivity.setAdvisterId(advisterId);
        marketSensitivity.setTimeInfo(timeInfo);
        marketSensitivity.setAdviserNums(advisterNums1 + advisterNums2);
        marketSensitivity.setOrderNums(orderNums1 + orderNums2);
        marketSensitivity.setAdvisterType(advisterType);

        return marketSensitivity;
    }
}

一个MarketSensitivityAnalySink实现了SinkFunction接口的存储类

public class MarketSensitivityAnalySink implements SinkFunction<MarketSensitivity> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(MarketSensitivity value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String userId = groupFields[2];
            String advisterId = groupFields[3];
            Integer advisterNums = value.getAdviserNums();
            Integer orderNums = value.getOrderNums();
            String sensitivityFlag = "";
            if (advisterNums <= 2 && orderNums == 0) {
                //广告点击次数小于2次,且没有下订单为对该用户对广告不敏感
                sensitivityFlag = "不敏感";
            }else if ((advisterNums > 5 && orderNums == 0)
                    || (advisterNums > 1 && orderNums == 1)) {
                //广告点击次数大于5次没有下订单或者广告点击大于1次下了一个订单为
                //该用户对广告敏感度一般
                sensitivityFlag = "一般";
            }else if ((advisterNums > 1 && orderNums > 1)
                    || (advisterNums > 5 && orderNums == 1)) {
                //广告点击次数大于1次且下了多个订单或者广告点击次数大于5次且下了一个订单
                //该用户对广告非常敏感
                sensitivityFlag = "非常敏感";
            }
            String tablename = "userAdvSensitivity_info";
            Map<String,String> dataMap = new HashMap<>();
            dataMap.put("userId",userId);
            dataMap.put("advisterId",advisterId);
            dataMap.put("advisterNums",advisterNums + "");
            dataMap.put("orderNums",orderNums + "");
            dataMap.put("sensitivityFlag",sensitivityFlag);
            Set<String> fields = new HashSet<>();
            fields.add("userId");
            fields.add("advisterId");
            fields.add("advisterNums");
            fields.add("orderNums");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然后是Flink的流处理,这里使用的是广告流和订单流的双流汇聚,对用户对市场广告敏感度进行画像。

/**
 * 市场广告营销敏感度
 */
public class MarketSensitivityAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumerAdv = new FlinkKafkaConsumer<>("adviser",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> dataAdv = env.addSource(myConsumerAdv);
        env.enableCheckpointing(5000);

        FlinkKafkaConsumer<String> myConsumerOrder = new FlinkKafkaConsumer<>("order",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> dataOrder = env.addSource(myConsumerOrder);
        //广告点击流和订单流双流汇聚
        DataStream<JSONObject> dataJoin = dataAdv.join(dataOrder).where(new KeySelector<String, String>() {
            @Override
            public String getKey(String value) throws Exception {
                AdvisterOpertor advisterOpertor = JSONObject.parseObject(value, AdvisterOpertor.class);
                Long advisterId = advisterOpertor.getAdvisterId();
                return advisterOpertor.getUserId() + "==" + advisterId;
            }
        }).equalTo(new KeySelector<String, String>() {
            @Override
            public String getKey(String value) throws Exception {
                Order order = JSONObject.parseObject(value, Order.class);
                Long advisterId = order.getAdvisterId();
                return order.getUserId() + "==" + advisterId;
            }
        }).window(TumblingEventTimeWindows.of(Time.hours(1)))
                .apply(new JoinFunction<String, String, JSONObject>() {
                    @Override
                    public JSONObject join(String first, String second) throws Exception {
                        JSONObject jsonObject = new JSONObject();
                        jsonObject.put("adviser", first);
                        jsonObject.put("order", second);
                        return jsonObject;
                    }
                });
        DataStream<MarketSensitivity> map = dataJoin.map(new MarketSensitivityAnalyMap());
        DataStream<MarketSensitivity> reduct = map.keyBy(MarketSensitivity::getGroupField)
                .timeWindowAll(Time.hours(1))
                .reduce(new MarketSensitivityAnalyReduce());
        reduct.addSink(new MarketSensitivityAnalySink());
        env.execute("portrait market sensitivity");
    }
}

创建用户画像广告类型营销敏感度标签

创建一个AdvisterTypeMarketSensitivityAnalyMap实现了MapFunction接口的转换类

public class AdvisterTypeMarketSensitivityAnalyMap implements MapFunction<MarketSensitivity,MarketSensitivity> {

    @Override
    public MarketSensitivity map(MarketSensitivity value) throws Exception {
        Long timeInfo = value.getTimeInfo();
        Integer advisterType = value.getAdvisterType();
        Integer advisterNums = value.getAdviserNums();
        Integer orderNums = value.getOrderNums();
        String sensitivityFlag = "";
        if (advisterNums <= 2 && orderNums == 0) {
            //广告点击次数小于2次,且没有下订单为对该用户对广告不敏感
            sensitivityFlag = "不敏感";
        }else if ((advisterNums > 5 && orderNums == 0)
                || (advisterNums > 1 && orderNums == 1)) {
            //广告点击次数大于5次没有下订单或者广告点击大于1次下了一个订单为
            //该用户对广告敏感度一般
            sensitivityFlag = "一般";
        }else if ((advisterNums > 1 && orderNums > 1)
                || (advisterNums > 5 && orderNums == 1)) {
            //广告点击次数大于1次且下了多个订单或者广告点击次数大于5次且下了一个订单
            //该用户对广告非常敏感
            sensitivityFlag = "非常敏感";
        }
        String advisterTypeName = "";
        switch (advisterType) {
            case 0:
                advisterTypeName = "动画";
                break;
            case 1:
                advisterTypeName = "纯文字";
                break;
            case 2:
                advisterTypeName = "视频";
                break;
            case 3:
                advisterTypeName = "文字加动画";
                break;
            default:
                break;
        }
        String groupField = "advisterType==" + timeInfo + "==" + advisterType
                + "==" + sensitivityFlag;
        Long advisterTypeNums = 1L;
        MarketSensitivity marketSensitivity = new MarketSensitivity();
        marketSensitivity.setGroupField(groupField);
        marketSensitivity.setAdvisterTypeName(advisterTypeName);
        marketSensitivity.setTimeInfo(timeInfo);
        marketSensitivity.setAdvisterTypeNums(advisterTypeNums);
        marketSensitivity.setSensitivityFlag(sensitivityFlag);

        return marketSensitivity;
    }
}

一个AdvisterTypeMarketSensitivityAnalyReduce实现了ReduceFunction接口的统计类

public class AdvisterTypeMarketSensitivityAnalyReduce implements ReduceFunction<MarketSensitivity> {
    @Override
    public MarketSensitivity reduce(MarketSensitivity value1, MarketSensitivity value2) throws Exception {
        String advisterTypeName = value1.getAdvisterTypeName();
        Long timeInfo = value1.getTimeInfo();
        String sensitivityFlag = value1.getSensitivityFlag();
        String groupField = value1.getGroupField();

        Long advisterTypeNums1 = value1.getAdvisterTypeNums();
        Long advisterTypeNums2 = value2.getAdvisterTypeNums();

        MarketSensitivity marketSensitivity = new MarketSensitivity();
        marketSensitivity.setAdvisterTypeName(advisterTypeName);
        marketSensitivity.setTimeInfo(timeInfo);
        marketSensitivity.setGroupField(groupField);
        marketSensitivity.setSensitivityFlag(sensitivityFlag);
        marketSensitivity.setAdvisterTypeNums(advisterTypeNums1 + advisterTypeNums2);
        return marketSensitivity;
    }
}

一个AdvisterTypeMarketSensitivityAnalySink实现了SinkFunction的标签存储类

public class AdvisterTypeMarketSensitivityAnalySink implements SinkFunction<MarketSensitivity> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(MarketSensitivity value, Context context) throws Exception {
        if (value != null) {
            String advisterTypeName = value.getAdvisterTypeName();
            Long timeInfo = value.getTimeInfo();
            String sensitivityFlag = value.getSensitivityFlag();
            Long advisterTypeNums = value.getAdvisterTypeNums();

            String tablename = "advistertype_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("advistertypename",advisterTypeName);
            dataMap.put("timeinfo",timeInfo + "");
            dataMap.put("sensitivityflag",sensitivityFlag);
            dataMap.put("advistertypenums",advisterTypeNums + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("advistertypenums");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然后修改MarketSensitivityAnaly的Flink流处理

/**
 * 市场广告营销敏感度
 */
public class MarketSensitivityAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumerAdv = new FlinkKafkaConsumer<>("adviser",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> dataAdv = env.addSource(myConsumerAdv);
        env.enableCheckpointing(5000);

        FlinkKafkaConsumer<String> myConsumerOrder = new FlinkKafkaConsumer<>("order",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> dataOrder = env.addSource(myConsumerOrder);
        //广告点击流和订单流双流汇聚
        DataStream<JSONObject> dataJoin = dataAdv.join(dataOrder).where(new KeySelector<String, String>() {
            @Override
            public String getKey(String value) throws Exception {
                AdvisterOpertor advisterOpertor = JSONObject.parseObject(value, AdvisterOpertor.class);
                Long advisterId = advisterOpertor.getAdvisterId();
                return advisterOpertor.getUserId() + "==" + advisterId;
            }
        }).equalTo(new KeySelector<String, String>() {
            @Override
            public String getKey(String value) throws Exception {
                Order order = JSONObject.parseObject(value, Order.class);
                Long advisterId = order.getAdvisterId();
                return order.getUserId() + "==" + advisterId;
            }
        }).window(TumblingEventTimeWindows.of(Time.hours(1)))
                .apply(new JoinFunction<String, String, JSONObject>() {
                    @Override
                    public JSONObject join(String first, String second) throws Exception {
                        JSONObject jsonObject = new JSONObject();
                        jsonObject.put("adviser", first);
                        jsonObject.put("order", second);
                        return jsonObject;
                    }
                });
        DataStream<MarketSensitivity> map = dataJoin.map(new MarketSensitivityAnalyMap());
        DataStream<MarketSensitivity> reduct = map.keyBy(MarketSensitivity::getGroupField)
                .timeWindowAll(Time.hours(1))
                .reduce(new MarketSensitivityAnalyReduce());
        reduct.addSink(new MarketSensitivityAnalySink());
        
        DataStream<MarketSensitivity> advisterTypeMap = reduct.map(new AdvisterTypeMarketSensitivityAnalyMap());
        DataStream<MarketSensitivity> advisterTypeReduce = advisterTypeMap.keyBy(MarketSensitivity::getGroupField)
                .timeWindowAll(Time.hours(1))
                .reduce(new AdvisterTypeMarketSensitivityAnalyReduce());
        advisterTypeReduce.addSink(new AdvisterTypeMarketSensitivityAnalySink());

        env.execute("portrait market sensitivity");
    }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章