博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Hadoop3集群搭建之——hive添加自定义函数UDTF (一行输入,多行输出)
阅读量:4951 次
发布时间:2019-06-11

本文共 4905 字,大约阅读时间需要 16 分钟。

上篇:

 

上篇中,udtf函数,只有为一行输入,一行输出。udtf是可以一行输入,多行输出的。

简述下需求:  

输入开始时间,结束时间,返回每个小时的时长

直接上代码:

package com.venn.udtf;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import java.util.ArrayList;/** * Created by venn on 5/20/2018. * SplitHour : split hour */public class SplitHour extends GenericUDTF {    /**     * add the column name     * @param args     * @return     * @throws UDFArgumentException     */    @Override    public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {        if (args.length != 1) {            throw new UDFArgumentLengthException("ExplodeMap takes only one argument");        }        if (args[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {            throw new UDFArgumentException("ExplodeMap takes string as a parameter");        }        ArrayList
fieldNames = new ArrayList
(); ArrayList
fieldOIs = new ArrayList
(); fieldNames.add("begintime"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); fieldNames.add("endtime"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); fieldNames.add("hour"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); fieldNames.add("seconds"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } /** * process the column * @param objects * @throws HiveException */ public void process(Object[] objects) throws HiveException { String [] input = objects[0].toString().split(","); // 2018-06-06 10:25:35 String beginTime = input[0]; String endTime = input[1]; String[] result = new String[4]; result[0] = beginTime; result[1] = endTime; // begintime int bhour = Integer.parseInt(beginTime.substring(11, 13)); int bmin = Integer.parseInt(beginTime.substring(14, 16)); int bsecond = Integer.parseInt(beginTime.substring(17, 19)); // endtime int ehour = Integer.parseInt(endTime.substring(11, 13)); int emin = Integer.parseInt(endTime.substring(14, 16)); int esecond = Integer.parseInt(endTime.substring(17, 19)); // 1.if begin hour equal end hour, second is : (emin - bmin) * 60 + (esecond - bsecond) if (bhour == ehour) { result[2] = String.valueOf(bhour); result[3] = String.valueOf((emin - bmin) * 60 + (esecond - bsecond)); forward(result); return; } boolean flag = true; //TODO 待优化,先输出第一个循环的时长,再循环后面的就不用判断 while (bhour != ehour) { result[2] = String.valueOf(bhour); if(flag){ flag = false; // 2. if begintime hour != endtime, the first hour, second is : 3600 - bmin * 60 - bsecond result[3] = String.valueOf(3600 - bmin * 60 - bsecond); }else { // 3. next hour is 3600 result[3] = String.valueOf(3600); } bhour += 1; // 输出到hive forward(result); } result[2] = String.valueOf(bhour); // 4. the end hour is : emin * 60 + esecond result[3] = String.valueOf( emin * 60 + esecond); forward(result); } public void close() throws HiveException { }}

 

udtf 函数介绍参加上篇

使用方式见上篇

 

样例:

hive> select split_hour( concat(begintime,',',endtime)) from viewlog where log_date=20180401 limit 10;OKbegintime    endtime    hour    seconds2018-04-01 10:26:14    2018-04-01 10:26:21    10    72018-04-01 07:21:47    2018-04-01 07:22:23    7    362018-04-01 15:18:08    2018-04-01 15:18:11    15    32018-04-01 18:05:13    2018-04-01 18:05:28    18    152018-04-01 07:18:34    2018-04-01 07:18:52    7    182018-04-01 23:28:32    2018-04-01 23:29:44    23    722018-04-01 06:34:11    2018-04-01 06:34:17    6    62018-04-01 14:02:40    2018-04-01 14:03:33    14    532018-04-01 17:30:23    2018-04-01 17:30:26    17    32018-04-01 12:15:07    2018-04-01 12:15:11    12    42018-04-01 06:53:40    2018-04-01 07:02:09    6    3802018-04-01 06:53:40    2018-04-01 07:02:09    7    129Time taken: 2.238 seconds, Fetched: 12 row(s)

搞定

 

转载于:https://www.cnblogs.com/Springmoon-venn/p/9286670.html

你可能感兴趣的文章
Ubuntu安装词典
查看>>
KVM虚拟机在线添加网卡
查看>>
Spring解析
查看>>
支付宝签约教程及注意事项
查看>>
Linux Glibc溢出漏洞凶猛来袭 可让***者获取操作系统的控制权限
查看>>
设计模式之原则
查看>>
Maven修改全局和局部JDK版本
查看>>
设计模式——组合模式(Composite Pattern)
查看>>
java设计模式之——代理模式
查看>>
php页面防止重复提交
查看>>
Perl DBI模块的例子
查看>>
python中str和repr区别
查看>>
升级win10后无法使用桥接网络解决方法
查看>>
如何进行跨网段的远程唤醒
查看>>
数据挖掘-同比与环比
查看>>
nginx+php详解
查看>>
怎样取php一个字符串中的某个字符
查看>>
我的友情链接
查看>>
RedHat6 管理应用服务【11】
查看>>
stm32F10x复习-1
查看>>