|
| 1 | +<h1>DPL-Parser</h1> |
| 2 | +<p>This package contains DPL-Parser and related walkers for generating different queries for Spark and Archive. |
| 3 | +</p> |
| 4 | +<h2>Parts</h2> |
| 5 | +<p>Whole package consist of lexer and parse, packed under ast-directory and post processors packed under walker directory. |
| 6 | +in addition to that, jooq.generated contains meta-models used with ConditionWalker</p> |
| 7 | +<h3>Visitors</h3> |
| 8 | +<p>Visitor are that part of the DPL parser which generates either SparkSQL- or XML-representation from incoming DPL-query</p> |
| 9 | +<ol> |
| 10 | +<li>DPLParserBaseVisitorImpl transforms parse-tree into the SparkSQL-queries.</li> |
| 11 | +<li>DPLParserXMLVisitor transforms parse-tree into the XML-representation. That is later on transformed into the |
| 12 | +Conditions or Columns using appropriate <i>walker</i>.</li> |
| 13 | +</ol> |
| 14 | +<h3>Walkers</h3> |
| 15 | +<ol> |
| 16 | +<li><p>XmlWalker is a base class which individual walkers inherit</p> |
| 17 | +<p>- Abstract class</p> |
| 18 | +<p>- Contains traverse method which walks XML-tree and for each node calls emit-method which each implementing walker must provide</p> |
| 19 | +<p>- Can't be used alone but concrete implementation must be provided</p> |
| 20 | +</li> |
| 21 | +<li><p>ConditionWalker is implementation returning Spark Condition objects which are used for Archive queries</p> |
| 22 | +<p>- Uses only LogicalStatement part so does not implement full DPL-syntax, only subset used in logical statements.</p></li> |
| 23 | +<li><p>DataframeWalker is implementation returning Spark dataframe-manipulation pipeline which can be used for manipulating Data streams from Kafka and Archive</p> |
| 24 | +<p>- Should handle full DPL-syntax</p> |
| 25 | +<p>- Consist of main DataframeWalker-class and static emitters under <i>dataframewalker directory</i>.</p> |
| 26 | +<p>- DataframeWalkerImpl selects which emitter to call when walker travels through XML-document</p> |
| 27 | +</li></ol> |
| 28 | +<h2>How to use different parsers and Walkers</h2> |
| 29 | +<p>Here is an example of how parsing stack works together. Following code sample illuminates how it works.</p> |
| 30 | +<blockquote> |
| 31 | + |
| 32 | +```java |
| 33 | + String q = "index = voyager _index_earliest=\"04/16/2020:10:25:40\" | chart count(_raw) as count by _time | where count > 70"; |
| 34 | + CharStream inputStream = CharStreams.fromString(q); |
| 35 | + DPLLexer lexer = new DPLLexer(inputStream); |
| 36 | + DPLParser parser = new DPLParser(new CommonTokenStream(lexer)); |
| 37 | + ParseTree tree = parser.root(); |
| 38 | + // Parse incoming DPL-query |
| 39 | + DPLParserXMLVisitor visitor = new DPLParserXMLVisitor("-1Y",null, q); |
| 40 | + fullResult = visitor.visit(tree).toString(); |
| 41 | + // get logical part which is used for archive queries |
| 42 | + // ConditionWalker is used for that. |
| 43 | + logicalPart = visitor.getLogicalPart(); |
| 44 | + |
| 45 | + // check column for archive query i.e. only logical part |
| 46 | + // This is used in Archive query |
| 47 | + String r = conditionWalker.fromString(logicalPart,false).toString(); |
| 48 | + |
| 49 | + // Full query generates Column-object |
| 50 | + DataFrame<Row> result = dataframeWalker.fromString(fullResult); |
| 51 | +``` |
| 52 | +</blockquote> |
| 53 | +<ol> |
| 54 | +<li>Lexer analyze incoming DPL-stream</li> |
| 55 | +<li>DPL-parser takes token stream and generates parse-tree</li> |
| 56 | +<li>XmlVisitor transforms that parse-tree into the XML-document</li> |
| 57 | +<li>ConditionWalker transforms query to Archive using XML-tree as a source</li> |
| 58 | +<li>DataframeWalker transforms query for Spark executable processing pipes using XML-tree as a source</li> |
| 59 | +</ol> |
| 60 | +<h2>XML-sample</h2> |
| 61 | +<p>Here is a resulting XML-document from simple DPL-query. That XML-tree is then uses as a source for Condition- and ColumnWalkers</p> |
| 62 | +<p> |
| 63 | + |
| 64 | +```xml |
| 65 | + |
| 66 | +<root> |
| 67 | + <!--index = voyager _index_earliest="04/16/2020:10:25:40" | chart count(_raw) as count by _time | where count > 70--> |
| 68 | + <search root="true"> |
| 69 | + <logicalStatement> |
| 70 | + <AND> |
| 71 | + <index operation="EQUALS" value="voyager"/> |
| 72 | + <index_earliest operation="GE" value="1587021940"/> |
| 73 | + </AND> |
| 74 | + <transformStatements> |
| 75 | + <transform> |
| 76 | + <divideBy field="_time"> |
| 77 | + <chart field="_raw" fieldRename="count" function="count"> |
| 78 | + <transform> |
| 79 | + <where> |
| 80 | + <evalCompareStatement field="count" operation="GT" value="70"/> |
| 81 | + </where> |
| 82 | + </transform> |
| 83 | + </chart> |
| 84 | + </divideBy> |
| 85 | + </transform> |
| 86 | + </transformStatements> |
| 87 | + </logicalStatement> |
| 88 | + </search> |
| 89 | +</root> |
| 90 | +``` |
| 91 | + |
| 92 | +<h2>Source code structure</h2> |
| 93 | +<p>Code structure reflects rather directly parser tree structure. Recognized parts are limited usually in their own classes. |
| 94 | +Here is a existing directory structure with little explanations.</p> |
| 95 | +<h3>ast</h3> |
| 96 | +<p>contains language related classes.</p> |
| 97 | +<ol> |
| 98 | + <li><b><i>bo</i></b> contains value objects which are used during DPL->Catalyst transformation to pass values forward. |
| 99 | +Notable classes are <i>StringNode, CatalystNode, ColumnNode and SubSearchNode.</i> |
| 100 | + </li> |
| 101 | +<li><b><i>commands</i></b> contains several visitor implementations their support classes and different context manipulation |
| 102 | +items. Individual commands are collected in their own classes or even packages.</li> |
| 103 | +<ul><b><i>aggregate</i></b> contains aggregate commands and their UDF-function implementations</ul> |
| 104 | +<ul><b><i>evalstatement</i></b> contains eval functions, their UDF-implementations and actual EvalStatement which acts as a |
| 105 | +main selector which handles all recognized eval operations</ul> |
| 106 | +<ul><b><i>logicalstatement</i></b> contains logicalStatement handling. In addition to that, there is also TimeStatement for handling |
| 107 | +time-ranges and dates.</ul> |
| 108 | +<ul><b><i>transformstatement</i></b> handles different transformations commands. Each transformation is implemented as separate class. |
| 109 | +Actual integration point is <i>TransformStatement</i> which selects which transformation to call and pass results back.</ul> |
| 110 | +</ol> |
| 111 | +<h4>Visitors, contexts and utils</h4> |
| 112 | +<p>Visitors manages different DPL->target language transformations. Currently there is DPL->XML transformation which is |
| 113 | +used when executing Archive queries and DPL->Catalyst transformation which is run in Spark-cluster for Archive and Kafka |
| 114 | +streams.</p> |
| 115 | +<p>DPLCatalystContext is used for passing data between parser and runtime environment. It contains for instance |
| 116 | +DPLParserConfig and DPLAuditInformation. Main point is that all configuration and runtime-relates data should pe passed |
| 117 | +through it because it is passed through visitors and each transformation and command should have access to it.</p> |
| 118 | +<p>There are several utils which manages for instance timestamp->epoch manipulations, general utils like quote-stripping, |
| 119 | +some debugging information tools and time-calculation utils.</p> |
| 120 | +<p>ProcessingStack contains current DataSet which visitor constructs. Apart from that if offers access to Parallel and Sequential |
| 121 | +stacks which can be manipulated. UI cal ask stack from visitor and resolve whether it is serial and act differently |
| 122 | +according to results. Usage samples can be found in StackTest</p> |
| 123 | +<h3>datasource</h3> |
| 124 | +<p><i>datasource</i> contains 2 datasource implementations.</p> |
| 125 | +<ol> |
| 126 | +<li><b><i>DPLDatasource</i></b> handles Archive connection and pulls out data from the according to query.</li> |
| 127 | +<li><b><i>GeneratedDatasource</i></b> offers way to return line or lines as a spark in-memory dataframes. That functionality is |
| 128 | +utilized in parser state commands like dpl, explain and teragrep.</li> |
| 129 | +</ol> |
0 commit comments