Details
-
Type:
Task
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 0.10.0
-
Component/s: Serializers/Deserializers
-
Labels:None
-
Hadoop Flags:Reviewed
Description
RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
currently in hive-serde. I think we should move it over to the hive-serde module so that
users don't have to go to the added effort of manually registering the contrib jar before
using it.
-
- ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3051.1.patch
- 10 kB
- Phabricator
-
- ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3051.2.patch
- 10 kB
- Phabricator
-
- ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3141.1.patch
- 39 kB
- Phabricator
-
- ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.1.patch
- 29 kB
- Phabricator
-
- ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.2.patch
- 28 kB
- Phabricator
-
- ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.3.patch
- 28 kB
- Phabricator
-
- ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.4.patch
- 28 kB
- Phabricator
-
- HIVE-1719.3.patch
- 27 kB
- Shreepadma Venugopalan
-
- HIVE-1719.D3249.1.patch
- 29 kB
- Shreepadma Venugopalan
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
shreepadma requested code review of "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Reviewers: JIRA, cwsteinbach
HIVE-1719.Move Regex Serde
RegexSerDe is as much a part of the standard Hive distribution as the other
SerDes
currently in hive-serde. I think we should move it over to the hive-serde module
so that
users don't have to go to the added effort of manually registering the contrib
jar before
using it.
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D3051
AFFECTED FILES
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
shreepadma updated the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Reviewers: JIRA, cwsteinbach
- This patch includes the deprecation warnings on top of the previous patch which moved the RegexSerDe to the SerDe directory
REVISION DETAIL
https://reviews.facebook.net/D3051
AFFECTED FILES
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
cwsteinbach has requested changes to the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Looks good, but we need to copy the regex serde testcases from contrib over to ql. I also noticed that the negative testcase isn't documented, and doesn't seem to exercise any of the error conditions in RegexSerDe.initialize(). Can you please add some additional test coverage for these cases? Thanks.
REVISION DETAIL
https://reviews.facebook.net/D3051
BRANCH
HIVE-1719
This patch makes the following changes,
- Moves Regex SerDe to hive-serde
- Deprecates Regex SerDe under hive-contrib. However the behavior of Regex Serde under hive-contrib is not altered in anyway.
Additionally it alters the behavior of Regex SerDe under hive-serde in the following ways,
- Raises an exception when the input pattern to Regex is null
- Logs a warning when the # of matching groups don't match the # of cols
- Deprecates the output.format.string
This patch also improves coverage for regex serde under hive-serde by adding additional positive and negative test cases. Please note that the regex serde in the contrib module contains a deprecation warning, but its behavior is not altered in any way.
cwsteinbach requested code review of "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Reviewers: JIRA
https://issues.apache.org/jira/secure/attachment/12526449/HIVE-1719.patch
RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
currently in hive-serde. I think we should move it over to the hive-serde module so that
users don't have to go to the added effort of manually registering the contrib jar before
using it.
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D3141
AFFECTED FILES
contrib/src/test/queries/clientnegative/serde_regex.q
contrib/src/test/queries/clientpositive/serde_regex.q
contrib/src/test/results/clientnegative/serde_regex.q.out
contrib/src/test/results/clientpositive/serde_regex.q.out
ql/src/test/queries/clientnegative/serde_regex.q
ql/src/test/queries/clientnegative/serde_regex2.q
ql/src/test/queries/clientnegative/serde_regex3.q
ql/src/test/queries/clientpositive/serde_regex.q
ql/src/test/queries/clientpositive/serde_regex2.q
ql/src/test/queries/clientpositive/serde_regex3.q
ql/src/test/results/clientnegative/serde_regex.q.out
ql/src/test/results/clientnegative/serde_regex2.q.out
ql/src/test/results/clientnegative/serde_regex3.q.out
ql/src/test/results/clientpositive/serde_regex.q.out
ql/src/test/results/clientpositive/serde_regex2.q.out
ql/src/test/results/clientpositive/serde_regex3.q.out
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/7149/
To: JIRA, cwsteinbach
Cc: cwsteinbach
cwsteinbach has commented on the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Submitted this review on behalf on Shreepadma using this patch: https://issues.apache.org/jira/secure/attachment/12526449/HIVE-1719.patch
REVISION DETAIL
https://reviews.facebook.net/D3141
To: JIRA, cwsteinbach
Cc: shreepadma
cwsteinbach has commented on the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
INLINE COMMENTS
contrib/src/test/queries/clientnegative/serde_regex.q:24 Since the first CREATE TABLE statement is expected to fail, we should probably remove the rest of this code.
contrib/src/test/queries/clientpositive/serde_regex.q:3 This isn't necessary. QTestUtil.clearTestSideEffects() automatically drops any tables/dbs that aren't defined in QTestUtil.srcTables. (Also, take a look at QTestUtil.createSources() ... I remember you asked about this yesterday).
contrib/src/test/queries/clientpositive/serde_regex.q:43 Please remove.
contrib/src/test/queries/clientnegative/serde_regex.q:41 Please refer to a system variable (set in one of the Ant build files) instead of using a relative path, e.g. "$
/files/apache.access.log"
contrib/src/test/queries/clientpositive/serde_regex.q:38 Replace relative paths.
ql/src/test/queries/clientnegative/serde_regex.q:2 Please remove.
ql/src/test/queries/clientnegative/serde_regex.q:23 Remove the rest of this code if we're never going to hit it.
ql/src/test/queries/clientnegative/serde_regex2.q:2 Please remove.
ql/src/test/queries/clientnegative/serde_regex2.q:3 Same test as clientnegative/serde_regex.q?
ql/src/test/queries/clientnegative/serde_regex2.q:20 Remove dead code.
ql/src/test/queries/clientnegative/serde_regex3.q:1 Remove. Replace with 'USE default;' if you're running into problems placing a comment on the first line of the file.
ql/src/test/queries/clientpositive/serde_regex.q:1 Remove.
ql/src/test/queries/clientpositive/serde_regex.q:4 I think we should either roll the clientpositive tests into a single file (my preference), or modify the qfile file names so that they give some hint as to what is being tested. Please take a look at clientpositive/database.q for a good example of the former approach.
ql/src/test/queries/clientpositive/serde_regex2.q:27 Please add a newline. You might want to configure your editor to add these automatically.
ql/src/test/queries/clientpositive/serde_regex.q:39 It's a little dangerous using 'SELECT *' in tests because it's not always possible to easily determine from the output which column contains which field. In addition to 'SELECT *', please also include a query which selects a single column in the middle in of the table, and a query which selects a subset of two or more columns from the table.
ql/src/test/queries/clientpositive/serde_regex2.q:4 I thought about this some more and think we should probably throw a runtime exception in the deserialize() method if this condition is detected. Otherwise, we're basically signing up to support this behavior in the future, which is not what we want to do.
Please file a JIRA for this issue (i.e. that we want to catch this condition at compile-time instead of runtime), and then move this test case over to the clientnegative directory and cite the JIRA in a comment. Thanks.
ql/src/test/queries/clientpositive/serde_regex3.q:3 This comment implies that output.format.string actually did something in the past, which isn't true. Also, the warning message doesn't show up in the q.out file, so I think we should probably just skip referencing this property in the tests.
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Need to update this javadoc. RegexSerDe only provides deserialization capabilities.
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 Formatting: Braces always go on their own line. This project basically uses the Sun Java Coding standard (http://www.oracle.com/technetwork/java/codeconv-138413.html) with a 100 char line limit and 2 character indent.
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 "Not support yet" is inaccurate since we never plan to support this feature. Please change this to "RegexSerDe does not support the serialize() method"
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:188 It's possible that this will get triggered for every row in the input file, which will quickly overflow the logs. We should log this at most once. Same thing goes for the log message below.
Currently SerDes don't really have a good way of logging error information like this. We should talk offline about ways of improving this.
REVISION DETAIL
https://reviews.facebook.net/D3141
To: JIRA, cwsteinbach
Cc: shreepadma
shreepadma requested code review of "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Reviewers: JIRA, cwsteinbach
Regex Serde Changes
RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
currently in hive-serde. I think we should move it over to the hive-serde module so that
users don't have to go to the added effort of manually registering the contrib jar before
using it.
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D3249
AFFECTED FILES
contrib/src/test/queries/clientnegative/serde_regex.q
contrib/src/test/queries/clientpositive/serde_regex.q
contrib/src/test/results/clientnegative/serde_regex.q.out
contrib/src/test/results/clientpositive/serde_regex.q.out
ql/src/test/queries/clientnegative/serde_regex.q
ql/src/test/queries/clientnegative/serde_regex2.q
ql/src/test/queries/clientnegative/serde_regex3.q
ql/src/test/queries/clientpositive/serde_regex.q
ql/src/test/results/clientnegative/serde_regex.q.out
ql/src/test/results/clientnegative/serde_regex2.q.out
ql/src/test/results/clientnegative/serde_regex3.q.out
ql/src/test/results/clientpositive/serde_regex.q.out
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/7341/
To: JIRA, cwsteinbach, shreepadma
Cc: shreepadma
cwsteinbach has requested changes to the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
INLINE COMMENTS
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:45 Please remove "It can also serialize the row object using a format string."
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Please change to "RegexSerDe uses a regular expression to deserialize row data. It does not support data serialization."
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:93 Please remove the outputFormatString variable and just do:
if (null != tbl.getProperty("output.format.string"))
{ LOG.warn(....); } serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:102 Throw the exception here instead of in the next IF block.
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:186 if (!alreadyLoggedNoMatch)
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:158 Adding "Count" to the end of this variable name would help to make it clear that this is a scalar variable as opposed to a collection type.
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:184 Does this logic do anything? It looks like the code will log the first unmatched row it finds, set alreadyLoggedNoMatch = true, and subsequently never log another warning. Is there any reason to call getNextNumberToDisplay()?
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:205 I think it's fine to log one warning for the first row that has partial matches. No need to get fancy with this exponential backoff logging strategy.
REVISION DETAIL
https://reviews.facebook.net/D3249
BRANCH
HIVE-1719-trunk
To: JIRA, cwsteinbach, shreepadma
shreepadma updated the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Reviewers: JIRA, cwsteinbach
Regex Serde Changes
RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
currently in hive-serde. I think we should move it over to the hive-serde module so that
users don't have to go to the added effort of manually registering the contrib jar before
using it.
This includes the changes requested by Carl from the previous revision.
REVISION DETAIL
https://reviews.facebook.net/D3249
AFFECTED FILES
contrib/src/test/queries/clientnegative/serde_regex.q
contrib/src/test/queries/clientpositive/serde_regex.q
contrib/src/test/results/clientnegative/serde_regex.q.out
contrib/src/test/results/clientpositive/serde_regex.q.out
ql/src/test/queries/clientnegative/serde_regex.q
ql/src/test/queries/clientnegative/serde_regex2.q
ql/src/test/queries/clientnegative/serde_regex3.q
ql/src/test/queries/clientpositive/serde_regex.q
ql/src/test/results/clientnegative/serde_regex.q.out
ql/src/test/results/clientnegative/serde_regex2.q.out
ql/src/test/results/clientnegative/serde_regex3.q.out
ql/src/test/results/clientpositive/serde_regex.q.out
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
To: JIRA, cwsteinbach, shreepadma
shreepadma updated the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Reviewers: JIRA, cwsteinbach
HIVE-1719: Changes requested by Carl
REVISION DETAIL
https://reviews.facebook.net/D3249
AFFECTED FILES
contrib/src/test/queries/clientnegative/serde_regex.q
contrib/src/test/queries/clientpositive/serde_regex.q
contrib/src/test/results/clientnegative/serde_regex.q.out
contrib/src/test/results/clientpositive/serde_regex.q.out
ql/src/test/queries/clientnegative/serde_regex.q
ql/src/test/queries/clientnegative/serde_regex2.q
ql/src/test/queries/clientnegative/serde_regex3.q
ql/src/test/queries/clientpositive/serde_regex.q
ql/src/test/results/clientnegative/serde_regex.q.out
ql/src/test/results/clientnegative/serde_regex2.q.out
ql/src/test/results/clientnegative/serde_regex3.q.out
ql/src/test/results/clientpositive/serde_regex.q.out
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
To: JIRA, cwsteinbach, shreepadma
cwsteinbach has accepted the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
+1. Will commit if tests pass.
REVISION DETAIL
https://reviews.facebook.net/D3249
BRANCH
HIVE-1719-trunk
To: JIRA, cwsteinbach, shreepadma
shreepadma updated the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Reviewers: JIRA, cwsteinbach
- fix test to workaround order by issue
REVISION DETAIL
https://reviews.facebook.net/D3249
AFFECTED FILES
contrib/src/test/queries/clientnegative/serde_regex.q
contrib/src/test/queries/clientpositive/serde_regex.q
contrib/src/test/results/clientnegative/serde_regex.q.out
contrib/src/test/results/clientpositive/serde_regex.q.out
ql/src/test/queries/clientnegative/serde_regex.q
ql/src/test/queries/clientnegative/serde_regex2.q
ql/src/test/queries/clientnegative/serde_regex3.q
ql/src/test/queries/clientpositive/serde_regex.q
ql/src/test/results/clientnegative/serde_regex.q.out
ql/src/test/results/clientnegative/serde_regex2.q.out
ql/src/test/results/clientnegative/serde_regex3.q.out
ql/src/test/results/clientpositive/serde_regex.q.out
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
To: JIRA, cwsteinbach, shreepadma
shreepadma has closed the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
Closed by cws.
REVISION DETAIL
https://reviews.facebook.net/D3249
COMMIT
https://reviews.facebook.net/rHIVE1340256
To: JIRA, cwsteinbach, shreepadma
Committed to trunk. Thanks Shreepadma!
Integrated in Hive-trunk-h0.21 #1439 (See https://builds.apache.org/job/Hive-trunk-h0.21/1439/)
HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde
(Shreepadma Venugopalan via Carl Steinbach)
Summary:
Regex Serde Changes
RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
currently in hive-serde. I think we should move it over to the hive-serde module so that
users don't have to go to the added effort of manually registering the contrib jar before
using it.
Test Plan: EMPTY
Reviewers: JIRA, cwsteinbach
Reviewed By: cwsteinbach
Differential Revision: https://reviews.facebook.net/D3249 (Revision 1340256)
Result = FAILURE
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1340256
Files :
- /hive/trunk/contrib/src/test/queries/clientnegative/serde_regex.q
- /hive/trunk/contrib/src/test/queries/clientpositive/serde_regex.q
- /hive/trunk/contrib/src/test/results/clientnegative/serde_regex.q.out
- /hive/trunk/contrib/src/test/results/clientpositive/serde_regex.q.out
- /hive/trunk/ql/src/test/queries/clientnegative/serde_regex.q
- /hive/trunk/ql/src/test/queries/clientnegative/serde_regex2.q
- /hive/trunk/ql/src/test/queries/clientnegative/serde_regex3.q
- /hive/trunk/ql/src/test/queries/clientpositive/serde_regex.q
- /hive/trunk/ql/src/test/results/clientnegative/serde_regex.q.out
- /hive/trunk/ql/src/test/results/clientnegative/serde_regex2.q.out
- /hive/trunk/ql/src/test/results/clientnegative/serde_regex3.q.out
- /hive/trunk/ql/src/test/results/clientpositive/serde_regex.q.out
- /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde
(Shreepadma Venugopalan via Carl Steinbach)
Summary:
Regex Serde Changes
RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
currently in hive-serde. I think we should move it over to the hive-serde module so that
users don't have to go to the added effort of manually registering the contrib jar before
using it.
Test Plan: EMPTY
Reviewers: JIRA, cwsteinbach
Reviewed By: cwsteinbach
Differential Revision: https://reviews.facebook.net/D3249 (Revision 1340256)
Result = ABORTED
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1340256
Files :
- /hive/trunk/contrib/src/test/queries/clientnegative/serde_regex.q
- /hive/trunk/contrib/src/test/queries/clientpositive/serde_regex.q
- /hive/trunk/contrib/src/test/results/clientnegative/serde_regex.q.out
- /hive/trunk/contrib/src/test/results/clientpositive/serde_regex.q.out
- /hive/trunk/ql/src/test/queries/clientnegative/serde_regex.q
- /hive/trunk/ql/src/test/queries/clientnegative/serde_regex2.q
- /hive/trunk/ql/src/test/queries/clientnegative/serde_regex3.q
- /hive/trunk/ql/src/test/queries/clientpositive/serde_regex.q
- /hive/trunk/ql/src/test/results/clientnegative/serde_regex.q.out
- /hive/trunk/ql/src/test/results/clientnegative/serde_regex2.q.out
- /hive/trunk/ql/src/test/results/clientnegative/serde_regex3.q.out
- /hive/trunk/ql/src/test/results/clientpositive/serde_regex.q.out
- /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.
Why is the RegexSerDe.java still exists in contrib directory? Is'nt it necessary to remove the file from contrib?
It was left in contrib so that we don't break backwards compatibility for existing users.
Shreepadma Venugopalan Could it be that you attached a wrong patch? I can't find any deprecation warning in the old RegexSerde so things are now confusing for everyone.
Note: we should probably leave a copy of the regexserde in the contrib module with a deprecation warning so that users have an easier migration route.