Skip to content

Commit 8eb29b5

Browse files
Merge pull request #1 from StrongestNumber9/main
Initial public release
2 parents 2d3b68d + 5848f50 commit 8eb29b5

278 files changed

Lines changed: 61415 additions & 16 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Maven Package
2+
3+
on:
4+
push:
5+
6+
jobs:
7+
build:
8+
9+
runs-on: ubuntu-latest
10+
permissions:
11+
contents: read
12+
packages: write
13+
14+
steps:
15+
- uses: actions/checkout@v3
16+
with:
17+
fetch-depth: 0
18+
19+
- name: Set up JDK 8
20+
uses: actions/setup-java@v3
21+
with:
22+
java-version: '8'
23+
distribution: 'temurin'
24+
server-id: github
25+
settings-path: ${{ github.workspace }}
26+
27+
- name: Get version
28+
run: echo "RELEASE_VERSION=$(git describe --tags)" >> $GITHUB_ENV
29+
30+
- name: Publish to GitHub Packages Apache Maven
31+
run: mvn -Pbuild -B -Drevision=${{ env.RELEASE_VERSION }} -Dsha1= -Dchangelist= deploy -s dependencies.settings.xml
32+
env:
33+
GITHUB_TOKEN: ${{ github.token }}
34+
SRV_CI_USER: ${{ secrets.SRV_CI_USER }}
35+
SRV_CI_TOKEN: ${{ secrets.SRV_CI_TOKEN }}

.gitignore

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,11 @@
1-
target/
2-
pom.xml.tag
3-
pom.xml.releaseBackup
4-
pom.xml.versionsBackup
5-
pom.xml.next
6-
release.properties
7-
dependency-reduced-pom.xml
8-
buildNumber.properties
9-
.mvn/timing.properties
10-
# https://github.com/takari/maven-wrapper#usage-without-binary-jar
11-
.mvn/wrapper/maven-wrapper.jar
12-
13-
# Eclipse m2e generated files
14-
# Eclipse Core
1+
/target/
2+
/.settings/
3+
/.vscode/
4+
/.idea/
5+
/*.iml
6+
/.classpath
7+
/.factorypath
8+
/.flattened-pom.xml
9+
fuzzer/**
10+
output.log
1511
.project
16-
# JDT-specific (Eclipse Java Development Tools)
17-
.classpath

LICENSE

Lines changed: 687 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
<h1>DPL-Parser</h1>
2+
<p>This package contains DPL-Parser and related walkers for generating different queries for Spark and Archive.
3+
</p>
4+
<h2>Parts</h2>
5+
<p>Whole package consist of lexer and parse, packed under ast-directory and post processors packed under walker directory.
6+
in addition to that, jooq.generated contains meta-models used with ConditionWalker</p>
7+
<h3>Visitors</h3>
8+
<p>Visitor are that part of the DPL parser which generates either SparkSQL- or XML-representation from incoming DPL-query</p>
9+
<ol>
10+
<li>DPLParserBaseVisitorImpl transforms parse-tree into the SparkSQL-queries.</li>
11+
<li>DPLParserXMLVisitor transforms parse-tree into the XML-representation. That is later on transformed into the
12+
Conditions or Columns using appropriate <i>walker</i>.</li>
13+
</ol>
14+
<h3>Walkers</h3>
15+
<ol>
16+
<li><p>XmlWalker is a base class which individual walkers inherit</p>
17+
<p>- Abstract class</p>
18+
<p>- Contains traverse method which walks XML-tree and for each node calls emit-method which each implementing walker must provide</p>
19+
<p>- Can't be used alone but concrete implementation must be provided</p>
20+
</li>
21+
<li><p>ConditionWalker is implementation returning Spark Condition objects which are used for Archive queries</p>
22+
<p>- Uses only LogicalStatement part so does not implement full DPL-syntax, only subset used in logical statements.</p></li>
23+
<li><p>DataframeWalker is implementation returning Spark dataframe-manipulation pipeline which can be used for manipulating Data streams from Kafka and Archive</p>
24+
<p>- Should handle full DPL-syntax</p>
25+
<p>- Consist of main DataframeWalker-class and static emitters under <i>dataframewalker directory</i>.</p>
26+
<p>- DataframeWalkerImpl selects which emitter to call when walker travels through XML-document</p>
27+
</li></ol>
28+
<h2>How to use different parsers and Walkers</h2>
29+
<p>Here is an example of how parsing stack works together. Following code sample illuminates how it works.</p>
30+
<blockquote>
31+
32+
```java
33+
String q = "index = voyager _index_earliest=\"04/16/2020:10:25:40\" | chart count(_raw) as count by _time | where count > 70";
34+
CharStream inputStream = CharStreams.fromString(q);
35+
DPLLexer lexer = new DPLLexer(inputStream);
36+
DPLParser parser = new DPLParser(new CommonTokenStream(lexer));
37+
ParseTree tree = parser.root();
38+
// Parse incoming DPL-query
39+
DPLParserXMLVisitor visitor = new DPLParserXMLVisitor("-1Y",null, q);
40+
fullResult = visitor.visit(tree).toString();
41+
// get logical part which is used for archive queries
42+
// ConditionWalker is used for that.
43+
logicalPart = visitor.getLogicalPart();
44+
45+
// check column for archive query i.e. only logical part
46+
// This is used in Archive query
47+
String r = conditionWalker.fromString(logicalPart,false).toString();
48+
49+
// Full query generates Column-object
50+
DataFrame<Row> result = dataframeWalker.fromString(fullResult);
51+
```
52+
</blockquote>
53+
<ol>
54+
<li>Lexer analyze incoming DPL-stream</li>
55+
<li>DPL-parser takes token stream and generates parse-tree</li>
56+
<li>XmlVisitor transforms that parse-tree into the XML-document</li>
57+
<li>ConditionWalker transforms query to Archive using XML-tree as a source</li>
58+
<li>DataframeWalker transforms query for Spark executable processing pipes using XML-tree as a source</li>
59+
</ol>
60+
<h2>XML-sample</h2>
61+
<p>Here is a resulting XML-document from simple DPL-query. That XML-tree is then uses as a source for Condition- and ColumnWalkers</p>
62+
<p>
63+
64+
```xml
65+
66+
<root>
67+
<!--index = voyager _index_earliest="04/16/2020:10:25:40" | chart count(_raw) as count by _time | where count > 70-->
68+
<search root="true">
69+
<logicalStatement>
70+
<AND>
71+
<index operation="EQUALS" value="voyager"/>
72+
<index_earliest operation="GE" value="1587021940"/>
73+
</AND>
74+
<transformStatements>
75+
<transform>
76+
<divideBy field="_time">
77+
<chart field="_raw" fieldRename="count" function="count">
78+
<transform>
79+
<where>
80+
<evalCompareStatement field="count" operation="GT" value="70"/>
81+
</where>
82+
</transform>
83+
</chart>
84+
</divideBy>
85+
</transform>
86+
</transformStatements>
87+
</logicalStatement>
88+
</search>
89+
</root>
90+
```
91+
92+
<h2>Source code structure</h2>
93+
<p>Code structure reflects rather directly parser tree structure. Recognized parts are limited usually in their own classes.
94+
Here is a existing directory structure with little explanations.</p>
95+
<h3>ast</h3>
96+
<p>contains language related classes.</p>
97+
<ol>
98+
<li><b><i>bo</i></b> contains value objects which are used during DPL->Catalyst transformation to pass values forward.
99+
Notable classes are <i>StringNode, CatalystNode, ColumnNode and SubSearchNode.</i>
100+
</li>
101+
<li><b><i>commands</i></b> contains several visitor implementations their support classes and different context manipulation
102+
items. Individual commands are collected in their own classes or even packages.</li>
103+
<ul><b><i>aggregate</i></b> contains aggregate commands and their UDF-function implementations</ul>
104+
<ul><b><i>evalstatement</i></b> contains eval functions, their UDF-implementations and actual EvalStatement which acts as a
105+
main selector which handles all recognized eval operations</ul>
106+
<ul><b><i>logicalstatement</i></b> contains logicalStatement handling. In addition to that, there is also TimeStatement for handling
107+
time-ranges and dates.</ul>
108+
<ul><b><i>transformstatement</i></b> handles different transformations commands. Each transformation is implemented as separate class.
109+
Actual integration point is <i>TransformStatement</i> which selects which transformation to call and pass results back.</ul>
110+
</ol>
111+
<h4>Visitors, contexts and utils</h4>
112+
<p>Visitors manages different DPL->target language transformations. Currently there is DPL->XML transformation which is
113+
used when executing Archive queries and DPL->Catalyst transformation which is run in Spark-cluster for Archive and Kafka
114+
streams.</p>
115+
<p>DPLCatalystContext is used for passing data between parser and runtime environment. It contains for instance
116+
DPLParserConfig and DPLAuditInformation. Main point is that all configuration and runtime-relates data should pe passed
117+
through it because it is passed through visitors and each transformation and command should have access to it.</p>
118+
<p>There are several utils which manages for instance timestamp->epoch manipulations, general utils like quote-stripping,
119+
some debugging information tools and time-calculation utils.</p>
120+
<p>ProcessingStack contains current DataSet which visitor constructs. Apart from that if offers access to Parallel and Sequential
121+
stacks which can be manipulated. UI cal ask stack from visitor and resolve whether it is serial and act differently
122+
according to results. Usage samples can be found in StackTest</p>
123+
<h3>datasource</h3>
124+
<p><i>datasource</i> contains 2 datasource implementations.</p>
125+
<ol>
126+
<li><b><i>DPLDatasource</i></b> handles Archive connection and pulls out data from the according to query.</li>
127+
<li><b><i>GeneratedDatasource</i></b> offers way to return line or lines as a spark in-memory dataframes. That functionality is
128+
utilized in parser state commands like dpl, explain and teragrep.</li>
129+
</ol>

dependencies.settings.xml

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
2+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3+
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
4+
http://maven.apache.org/xsd/settings-1.0.0.xsd">
5+
<activeProfiles>
6+
<activeProfile>github-packages</activeProfile>
7+
</activeProfiles>
8+
<profiles>
9+
<profile>
10+
<id>github-packages</id>
11+
<repositories>
12+
<!-- general dependencies - explicitly set for efficiency reasons -->
13+
<repository>
14+
<id>central</id>
15+
<url>https://repo1.maven.org/maven2</url>
16+
</repository>
17+
<!-- dependencies -->
18+
<repository>
19+
<id>pth_03</id>
20+
<url>https://maven.pkg.github.com/teragrep/pth_03</url>
21+
</repository>
22+
<repository>
23+
<id>jue_01</id>
24+
<url>https://maven.pkg.github.com/teragrep/jue_01</url>
25+
</repository>
26+
<repository>
27+
<id>rlo_06</id>
28+
<url>https://maven.pkg.github.com/teragrep/rlo_06</url>
29+
</repository>
30+
<repository>
31+
<id>pth_06</id>
32+
<url>https://maven.pkg.github.com/teragrep/pth_06</url>
33+
</repository>
34+
<repository>
35+
<id>dpf_02</id>
36+
<url>https://maven.pkg.github.com/teragrep/dpf_02</url>
37+
</repository>
38+
<repository>
39+
<id>dpf_03</id>
40+
<url>https://maven.pkg.github.com/teragrep/dpf_03</url>
41+
</repository>
42+
</repositories>
43+
</profile>
44+
</profiles>
45+
<servers>
46+
<!-- private dependencies -->
47+
<server>
48+
<id>pth_03</id>
49+
<username>${env.SRV_CI_USER}</username>
50+
<password>${env.SRV_CI_TOKEN}</password>
51+
</server>
52+
<!-- public dependencies -->
53+
<server>
54+
<id>jue_01</id>
55+
<username>${env.GITHUB_ACTOR}</username>
56+
<password>${env.GITHUB_TOKEN}</password>
57+
</server>
58+
<server>
59+
<id>rlo_06</id>
60+
<username>${env.GITHUB_ACTOR}</username>
61+
<password>${env.GITHUB_TOKEN}</password>
62+
</server>
63+
<server>
64+
<id>pth_06</id>
65+
<username>${env.GITHUB_ACTOR}</username>
66+
<password>${env.GITHUB_TOKEN}</password>
67+
</server>
68+
<server>
69+
<id>dpf_02</id>
70+
<username>${env.GITHUB_ACTOR}</username>
71+
<password>${env.GITHUB_TOKEN}</password>
72+
</server>
73+
<server>
74+
<id>dpf_03</id>
75+
<username>${env.GITHUB_ACTOR}</username>
76+
<password>${env.GITHUB_TOKEN}</password>
77+
</server>
78+
<!-- for uploading -->
79+
<server>
80+
<id>github</id>
81+
<username>${env.GITHUB_ACTOR}</username>
82+
<password>${env.GITHUB_TOKEN}</password>
83+
</server>
84+
</servers>
85+
</settings>

0 commit comments

Comments
 (0)