Skip to content

Commit 347137e

Browse files
Adjusted task names and error logging
1 parent 86e0cd0 commit 347137e

11 files changed

Lines changed: 54 additions & 34 deletions

README.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -43,28 +43,32 @@ _Extraction/extraction-results_ once the extraction is complete, or has been sto
4343
Start the extraction by executing the `start-extraction` script (see examples further below).
4444
The basic syntax is the following:
4545

46-
- `start-extraction.(sh|bat) (linux|busybox) [commit-id/tag] [commit-id/tag]`
46+
- `start-extraction.(sh|bat) (linux|busybox|GIT_HTTPS_LINK) [commit-id/tag] [commit-id/tag]`
4747
- `(option-1|option-2)` -> You *must* provide either a value for `option-1` or `option-2`.
4848
- `[option]` -> You *may* provide a value
4949

50-
The script must be provided with `busybox` or `linux` as first argument, in order to specify which SPL should be considered.
50+
The script must be provided with `busybox`, `linux`, or the https clone link of a public git repository as first argument, in order to specify which SPL should be considered.
5151
In addition, you can optionally provide either one or two more arguments specifying a commit-id or git-tag.
5252

53-
If you specify __no__ id or tag, the entire history is considered.
53+
> If you specify `linux` or `busybox`, a full extraction, including the analysis of the build system, will be performed for either Linux or BusyBox. If you specify the clone link to a git repository, a partial extraction, without considering the build system, will be performed. This is due to technicalities of the build system analysis, which require a project specific setup, which we only performed for Linux and BusyBox.
5454
55-
If you specify __exactly one__ id or tag, the extraction will only consider the one commit that is found under the id/tag.
55+
> If you specify __no__ id or tag, the entire history is considered.
56+
57+
> If you specify __exactly one__ id or tag, the extraction will only consider the one commit that is found under the id/tag.
5658
This can be used to quickly test whether everything is working as intended or to run the extraction for one commit only (e.g., when no evolution information is necessary).
5759

58-
If you specify __two__ ids or tags, the extraction will consider the range of commits that lies between the first and the second
60+
> If you specify __two__ ids or tags, the extraction will consider the range of commits that lies between the first and the second
5961
commit. The commit retrieval follows the same logic as [git log](https://git-scm.com/docs/git-log), i.e., it will retrieve
6062
all commits that are ancestors of the second commit, but __not__ ancestors of the first commit.
6163

6264
- Windows CMD:
6365
- `start-extraction.bat busybox [id/tag] [id/tag]`
6466
- `start-extraction.bat linux [id/tag] [id/tag]`
67+
- `start-extraction.bat https://github.com/MarlinFirmware/Marlin.git [id/tag] [id/tag]`
6568
- Linux terminal:
6669
- `./start-extraction.sh busybox [id/tag] [id/tag]`
6770
- `./start-extraction.sh linux [id/tag] [id/tag]`
71+
- `./start-extraction.sh https://github.com/MarlinFirmware/Marlin.git [id/tag] [id/tag]`
6872

6973
#### Runtime
7074
The entire history of BusyBox can be extracted in about one day.
@@ -77,21 +81,24 @@ Therefore, errors that appear in the log do not necessarily indicate a problem w
7781

7882
#### Examples:
7983
```
80-
Extract the ground truth for all commits of BusyBox
84+
# Extract the ground truth for all commits of BusyBox
8185
start-extraction.bat busybox
8286
./start-extraction.sh busybox
8387
84-
Extract the ground truth between two specific commits of Busybox
88+
# Extract the ground truth between two specific commits of Busybox
8589
start-extraction.bat busybox b35eef5383a4e7a6fb60fcf3833654a0bb2245e0 7de0ab21d939a5a304157f75918d0318a95261a3
8690
./start-extraction.sh busybox b35eef5383a4e7a6fb60fcf3833654a0bb2245e0 7de0ab21d939a5a304157f75918d0318a95261a3
8791
88-
Extract the ground truth for the commit under revision tag v4.1 of Linux
92+
# Extract the ground truth for the commit under revision tag v4.1 of Linux
8993
start-extraction.bat linux v4.1
9094
./start-extraction.sh linux v4.1
9195
92-
Extract the ground truth for all commits between two minor revisions of Linux
96+
# Extract the ground truth for all commits between two minor revisions of Linux
9397
start-extraction.bat linux v4.3 v4.4
9498
./start-extraction.sh linux v4.3 v4.4
99+
100+
# Extract a partial ground truth (no feature mode, no file conditions) for the entire history of Marlin
101+
./start-extraction.sh https://github.com/MarlinFirmware/Marlin.git
95102
```
96103

97104
### Stopping the Ground Truth Extraction

docker-resources/extraction_busybox.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ source_tree = ./busybox
4747
source_repo_url = https://git.busybox.net/busybox/
4848

4949
# Do not change this
50-
analysis.class = org.variantsync.vevos.extraction.kh.FullAnalysis
50+
analysis.class = org.variantsync.vevos.extraction.kh.FullExtraction
5151
preparation.class.0 = net.ssehub.kernel_haven.busyboot.PrepareBusybox
5252
analysis.output.type = csv
5353

docker-resources/extraction_generic.properties

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ extraction.timeout = 60
1717
### Analysis Parameters ###
1818
# How many commits should be analyzed in parallel? Warning: each task requires a considerable amount
1919
# of resources
20-
analysis.number_of_tasks = 5
20+
analysis.number_of_tasks = 3
2121

2222
# Whether the file condition (aka. presence condition of source file) should be treated as 'true' or 'false' (default is 'true'),
2323
# in case of missing build model information (i.e., no feature model or file condition)
@@ -40,7 +40,7 @@ source_tree = TBD
4040
source_repo_url = TBD
4141

4242
# Do not change this
43-
analysis.class = org.variantsync.vevos.extraction.kh.PartialAnalysis
43+
analysis.class = org.variantsync.vevos.extraction.kh.PartialExtraction
4444
analysis.output.type = csv
4545

4646
######################################

docker-resources/extraction_linux.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ source_tree = ./linux
4646
source_repo_url = https://github.com/torvalds/linux.git
4747

4848
# Do not change this
49-
analysis.class = org.variantsync.vevos.extraction.kh.FullAnalysis
49+
analysis.class = org.variantsync.vevos.extraction.kh.FullExtraction
5050
analysis.output.type = csv
5151

5252
######################################

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@
130130
<dependency>
131131
<groupId>net.lingala.zip4j</groupId>
132132
<artifactId>zip4j</artifactId>
133-
<version>2.9.0</version>
133+
<version>2.10.0</version>
134134
</dependency>
135135
</dependencies>
136136
</project>

src/main/java/org/variantsync/vevos/extraction/AnalysisTask.java

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,16 @@ public class AnalysisTask implements Runnable {
4343
private final File parentPropertiesFile;
4444
private final String splName;
4545
private final long timeout;
46+
private final boolean fullExtraction;
4647

47-
public AnalysisTask(List<RevCommit> commits, File parentDir, File propertiesFile, String splName, long timeout) {
48+
public AnalysisTask(List<RevCommit> commits, File parentDir, File propertiesFile, String splName, long timeout, boolean fullExtraction) {
4849
this.commits = commits;
4950
this.parentDir = parentDir;
5051
this.parentPropertiesFile = propertiesFile;
5152
this.splName = splName;
5253
this.taskNumber = existingTasksCount++;
5354
this.timeout = timeout;
55+
this.fullExtraction = fullExtraction;
5456
}
5557

5658
@Override
@@ -168,7 +170,7 @@ private Set<String> determineProcessedCommits(Path pathToTargetDir) {
168170
return processedCommits;
169171
}
170172

171-
private static void moveResultsToDirectory(File workDir, Path pathToTargetDir, RevCommit commit, File prepareFail) {
173+
private void moveResultsToDirectory(File workDir, Path pathToTargetDir, RevCommit commit, File prepareFail) {
172174
String commitId = commit.getName();
173175
LOGGER.logStatus("Moving result to common output directory.");
174176
File data_collection_dir = pathToTargetDir.resolve("data").resolve(commitId).toFile();
@@ -186,8 +188,11 @@ private static void moveResultsToDirectory(File workDir, Path pathToTargetDir, R
186188
// Move the results of the analysis to the collected output directory according to the current commit
187189
LOGGER.logStatus("Moving presence conditions to common output directory.");
188190
boolean hasError = movePresenceConditions(outputDir, data_collection_dir);
191+
189192
LOGGER.logStatus("Moving DIMACS feature model to common output directory.");
190-
hasError = hasError | moveDimacsModel(outputDir, data_collection_dir);
193+
if (fullExtraction) {
194+
hasError = hasError | moveDimacsModel(outputDir, data_collection_dir);
195+
}
191196

192197
LOGGER.logStatus("Moving FILTERED file to common output directory.");
193198
if(moveFilterCount(outputDir, data_collection_dir)) {
@@ -201,7 +206,9 @@ private static void moveResultsToDirectory(File workDir, Path pathToTargetDir, R
201206

202207
// Move the cache of the extractors to the collected output directory
203208
LOGGER.logStatus("Moving extractor cache to common output directory.");
204-
hasError = hasError | moveFeatureModel(workDir, data_collection_dir);
209+
if (fullExtraction) {
210+
hasError = hasError | moveFeatureModel(workDir, data_collection_dir);
211+
}
205212

206213
// Move the log to the common output directory
207214
LOGGER.logStatus("Moving KernelHaven log to common output directory");
@@ -219,7 +226,7 @@ private static void moveResultsToDirectory(File workDir, Path pathToTargetDir, R
219226
} else {
220227
writeParents(commit, data_collection_dir);
221228
writeToFile(data_collection_dir, COMMIT_MESSAGE_FILE, commit.getFullMessage());
222-
if (prepareFail.exists()) {
229+
if (prepareFail.exists() && this.fullExtraction) {
223230
LOGGER.logWarning("KernelHaven was not able to correctly load the build model, the extracted file presence conditions are incomplete!");
224231
EXECUTOR.execute("echo \"" + commitId + "\" >> " + INCOMPLETE_PC_COMMIT_FILE, pathToTargetDir.toFile());
225232
} else {
@@ -287,11 +294,13 @@ private static boolean moveFeatureModel(File workDir, File targetDir) {
287294
return hasError;
288295
}
289296

290-
private static boolean moveOutputFile(File outputDir, File targetDir, String sourceName, String targetName) {
297+
private static boolean moveOutputFile(File outputDir, File targetDir, String sourceName, String targetName, boolean errorExpected) {
291298
boolean hasError = false;
292299
File[] resultFiles = outputDir.listFiles((dir, name) -> name.contains(sourceName));
293300
if (resultFiles == null || resultFiles.length == 0) {
294-
LOGGER.logError("NO RESULT FILE IN " + outputDir.getAbsolutePath());
301+
if (!errorExpected) {
302+
LOGGER.logError("NO RESULT FILE IN " + outputDir.getAbsolutePath());
303+
}
295304
hasError = true;
296305
} else if (resultFiles.length == 1) {
297306
try {
@@ -313,19 +322,19 @@ private static boolean moveOutputFile(File outputDir, File targetDir, String sou
313322
}
314323

315324
private static boolean movePresenceConditions(File outputDir, File targetDir) {
316-
return moveOutputFile(outputDir, targetDir, "Blocks.csv", "code-variability.spl.csv");
325+
return moveOutputFile(outputDir, targetDir, "Blocks.csv", "code-variability.spl.csv", false);
317326
}
318327

319328
private static boolean moveDimacsModel(File outputDir, File targetDir) {
320-
return moveOutputFile(outputDir, targetDir, "feature-model.dimacs", "feature-model.dimacs");
329+
return moveOutputFile(outputDir, targetDir, "feature-model.dimacs", "feature-model.dimacs", false);
321330
}
322331

323332
private static boolean moveFilterCount(File outputDir, File targetDir) {
324-
return moveOutputFile(outputDir, targetDir, "FILTERED.txt", "FILTERED.txt");
333+
return moveOutputFile(outputDir, targetDir, "FILTERED.txt", "FILTERED.txt", true);
325334
}
326335

327336
private static boolean moveVariablesFile(File outputDir, File targetDir) {
328-
return moveOutputFile(outputDir, targetDir, "VARIABLES.txt", "VARIABLES.txt");
337+
return moveOutputFile(outputDir, targetDir, "VARIABLES.txt", "VARIABLES.txt", true);
329338
}
330339

331340
private void createBlocker(File dir) {

src/main/java/org/variantsync/vevos/extraction/Extraction.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ public class Extraction {
4545
public static final @NonNull Setting<@Nullable Integer> EXTRACTION_TIMEOUT
4646
= new Setting<>("extraction.timeout", Setting.Type.INTEGER, false, "0", "" +
4747
"The timeout for the KernelHaven execution in seconds.");
48+
public static final @NonNull Setting<@Nullable String> ANALYSIS_CLASS
49+
= new Setting<>("analysis.class", Setting.Type.STRING, true, null, "Class of the pipeline that is used for the analysis");
4850
private static final Logger LOGGER = Logger.get();
4951
private static final ShellExecutor EXECUTOR = new ShellExecutor(LOGGER);
5052

@@ -104,9 +106,10 @@ public static void main(String... args) throws IOException, GitAPIException {
104106
// Create a task for each commit subset and submit it to the thread pool
105107
int count = 0;
106108
LOGGER.logStatus("Scheduling tasks...");
109+
boolean fullExtraction = config.getValue(ANALYSIS_CLASS).endsWith("FullAnalysis");
107110
for (List<RevCommit> commitSubset : commitSubsets) {
108111
count += commitSubset.size();
109-
threadPool.submit(new AnalysisTask(commitSubset, workingDirectory, propertiesFile, splDir.getName(), config.getValue(EXTRACTION_TIMEOUT)));
112+
threadPool.submit(new AnalysisTask(commitSubset, workingDirectory, propertiesFile, splDir.getName(), config.getValue(EXTRACTION_TIMEOUT), fullExtraction));
110113
}
111114
LOGGER.logStatus("all " + commitSubsets.size() + " tasks scheduled.");
112115
threadPool.shutdown();
@@ -157,6 +160,7 @@ private static Configuration getConfiguration(File propertiesFile) {
157160
config.registerSetting(RESULT_REPO_COMMITTER_NAME);
158161
config.registerSetting(RESULT_REPO_COMMITTER_EMAIL);
159162
config.registerSetting(EXTRACTION_TIMEOUT);
163+
config.registerSetting(ANALYSIS_CLASS);
160164
} catch (SetUpException e) {
161165
LOGGER.logError("Invalid configuration detected:", e.getMessage());
162166
quitOnError();

src/main/java/org/variantsync/vevos/extraction/kh/FullAnalysis.java renamed to src/main/java/org/variantsync/vevos/extraction/kh/FullExtraction.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@
77
import net.ssehub.kernel_haven.util.null_checks.NonNull;
88
import net.ssehub.kernel_haven.fe_analysis.pcs.CodeBlockAnalysis;
99

10-
public class FullAnalysis extends PipelineAnalysis {
10+
public class FullExtraction extends PipelineAnalysis {
1111

1212
/**
13-
* Creates a new {@link FullAnalysis}.
13+
* Creates a new {@link FullExtraction}.
1414
*
1515
* @param config The global configuration.
1616
*/
17-
public FullAnalysis(@NonNull Configuration config) {
17+
public FullExtraction(@NonNull Configuration config) {
1818
super(config);
1919
}
2020

src/main/java/org/variantsync/vevos/extraction/kh/PartialAnalysis.java renamed to src/main/java/org/variantsync/vevos/extraction/kh/PartialExtraction.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@
77
import net.ssehub.kernel_haven.fe_analysis.pcs.CodeBlockAnalysis;
88
import net.ssehub.kernel_haven.util.null_checks.NonNull;
99

10-
public class PartialAnalysis extends PipelineAnalysis {
10+
public class PartialExtraction extends PipelineAnalysis {
1111

1212
/**
13-
* Creates a new {@link PartialAnalysis}.
13+
* Creates a new {@link PartialExtraction}.
1414
*
1515
* @param config The global configuration.
1616
*/
17-
public PartialAnalysis(@NonNull Configuration config) {
17+
public PartialExtraction(@NonNull Configuration config) {
1818
super(config);
1919
}
2020

src/main/resources/extraction_busybox.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ analysis.code_block.consider_missing_bm_infos = true
4949
#######################################
5050

5151
# Do not change this
52-
analysis.class = org.variantsync.vevos.extraction.kh.FullAnalysis
52+
analysis.class = org.variantsync.vevos.extraction.kh.FullExtraction
5353
preparation.class.0 = net.ssehub.kernel_haven.busyboot.PrepareBusybox
5454
analysis.output.type = csv
5555

0 commit comments

Comments
 (0)