C# Soutien/Tutoring TP – SetLoader
- Skeleton
.set
format- Lines Reader + Filter
- Stage 2: Expanding
- Stage 3: Transformation
- Stage 4: Generic Transformation
This mini-subject intends to give you an example of how to use and why use some of the features we showed you.
In this subject, we will go over:
- File I/O
- Functions as parameter
- Genericity
- Sets
The goal of this subject is to implement a program that parses .set
files and fills a HashSet
with it.
Skeleton
Here is the basic scaffolding for this TP:
public static class SetParser
{
// ========================================
// STAGE 1
// ========================================
public static List<String> LoadAndFilterLines(string path)
{
// FIXME
throw new NotImplementedException();
}
// ========================================
// STAGE 2
// ========================================
public static List<string> LoadAndExpand(string filePath)
{
// FIXME
throw new NotImplementedException();
}
// ========================================
// STAGE 3
// ========================================
public static HashSet<string> LinesToSet(List<string> lines)
{
// FIXME
throw new NotImplementedException();
}
public static HashSet<string> LoadStringSet(string filePath)
{
// FIXME
throw new NotImplementedException();
}
// ========================================
// STAGE 4
// ========================================
public static HashSet<T> LinesToSet<T>(List<string> lines, Func<string, T> transformer)
{
// FIXME
throw new NotImplementedException();
}
public static HashSet<T> LoadSet<T>(string filePath)
{
// FIXME
throw new NotImplementedException();
}
}
.set
format
This subject uses a fictional file type called the dot-set format.
The dot-set format is used to represent sets. The format is:
- All elements are made of non-special ASCII characters (A-Z, a-z, 0-9, all basic punctuation)
- One element per line. Line breaks can be either
\r\n
(Windows format) or\n
. A parser must handle both.- Empty lines must be ignored
- Lines that immediately start with a
#
must be ignored: they represent comments, except in the following cases.- Lines that start with
#=
represent a file import: those must not be ignored.
- Lines that start with
- Space and tabs characters at the beginning of a line must be ignored, e.g.
\t \t hello
must be considered as justhello
.
- Elements can be turned into any value. The transformation process is up to the implementation.
- Duplicated values (i.e. elements that lead to the same value) must be ignored (i.e. only add them once).
- If an element starts with
#=
, it must be processed specially. Read the file with the path next to the#=
, relative to the location of the current file and append all of its values into the set we are currently constructing. If a file is designated twice, it must be imported only once.
Here is an example:
Matthieu
Paul
Thomas
# I am a comment
Zoroark
Cobaltarrena
Qwarks
Zoroark
Cobaltarrena
Qwarks
In the end, we should get a set containing six values: Matthieu, Paul, Thomas, Zoroark, Cobatlarrena and Qwarks.
Lines Reader + Filter
This function will be responsible for reading our file and excluding lines we’re not interested in.
Implement a function that returns a list of lines:
- Reads each line from the file at
path
. You must handle both\r\n
and\n
line endings. - If the line is empty or starts with a
#
, unless it starts with#=
, go to the next line.ww - Remove spaces and tabs at the beginning of the line.
- Add the line (without the line ending character(s)) to the end of the list.
public static List<String> LoadAndFilterLines(string path)
Stage 2: Expanding
The Problem
We now need to implement an expansion mechanism. The idea behind it is to replace all lines that start with a #=
with the values contained in the file name next to it. For example:
# file1.set
One
Two
#=file2.set
Four
# file2.set
Three
Our end result should contain, in that order, One, Two, Three and Four.
However, it is possible to have cyclic dependencies, such as:
# file1.set
One
Two
#=file2.set
# file2.set
Three
#=file3.set
# file3.set
Four
#=file1.set
However, the format tells us that we must only import any file at most once, so this is not really a problem. The end result here would be One, Two, Three and Four.
We do have to be careful about a specific case though, and that’s the following:
# file1.set
One
#=file2.set
#=./file2.set
# file2.set
Two
Although file2.set
and ./file2.set
are different strings, they represent the same file, and the end result must be One, Two.
Note that paths are also resolved relative to where we currently are. In the examples above, this just means that we will lok for ./file2.set
in the directory where file1.set
is.
The Solution
If we think about the problem at hand, our function could be summed up like this:
load_file(f: real path of the file, already_loaded: list) => list:
if f is in already_loaded => return empty list
add f to already_loaded
load lines of f
create an output list
for each line of f:
if the line represents an import:
compute the real path of the line compared to f
add everything contained in load_file(the file to import) to the output list
else:
add the line to the output list
Implement this algorithm in C# in the following function:
public static List<string> LoadAndExpand(string filePath);
You can use the function you previously wrote.
For computing real paths, you will need:
- Directory.GetParent to get the directory containing a file
- Path.GetFullPath to get the actual path of a relative path based on a given folder.
Stage 3: Transformation
Now that we are able to get a clean list of lines, it’s time to construct our set. The case of strings is the easiest one by far. Simply consider that the transformation is just leaving the string intact.
public static HashSet<string> LinesToSet(List<string> lines);
Let’s also make the final function that takes care of loading and expanding our files, then turns them into a set of strings:
public static HashSet<string> LoadStringSet(string filePath);
Stage 4: Generic Transformation
Let’s now make a generic version. We want to be able to handle anything, but we do not necessarily know how to create “something” from a string, since that “something” can be anything.
The solution here is to add a transformation function as a parameter of our functions, like so:
public static HashSet<T> LinesToSet<T>(List<string> lines, Func<string, T> transformer);
And make the final function.
public static HashSet<T> LoadSet<T>(string filePath);