It is possible to search both individual hits and full clusters with different queries.
The interface also has an analyze tool, where you can generate graphs based on the search query.
These graphs provide a better image of how the hits are distributed over time as well as other information about clusters.
Here are instructions on how to use the search interface.
The basic principle is to set a word into the search field.
This will return all hits that have the search word in it.
Every hit has its cluster number and by clicking it, you can see the whole cluster of similar hits.
The search engine is based on Solr, so it is capable of a lot more advanced queries as well. You can search without defining a field from where to look. e.g. 'text:kissa' or just 'kissa'. However, any advanced queries will require the field. If you do not define a field, all possible fields will be looked at to find the search term. Available fields are shown at the bottom of these instructions.
Below are the basic building blocks of the query, but any query that works on Solr will work.
+word - The word 'word' must be in the hit.
-word - The word 'word' must NOT be in the hit.
These can also be combined together:
+first_word -second_word - Must have word 'first_word', but must not have word 'second_word'.
Basic boolean operators can also be used:
text:first_word AND text:second_word - Must have both words.
text:first_word OR text:second_word - Must have either first_word or second_word.
Searching for a phrase works as well:
"a full phrase" - Searches for a exact match.
Fuzzy and proximity searches work for both phrases and individual words:
text:"a full phrase"~ or text:word~ - Fuzzy matching the search query.
text:"first_word second_word third_word"~20 - All words must be within 20 words of each other.
Wildcards work normally:
word* - Search for words that start with 'word' and then any possible endings.
word? - Search for word 'word' where there is one extra character at the end.
Defining the field to search for:
text:first_word - Search for the 'first_word' from 'text' field.
count:[50 TO *] Count is equal or bigger than 50.
These can all be combined in any way:
+text:first_word -text:second_word +title:Aamu* - Texts that have first_word, don't have second_word and the title starts with 'Aamu'.
Available fields when searching for hits:year - Publication year of the issue where the hit comes from.
date - Full date of the issue.
filename - Name of the file.
language - Language of the issue.
cluster_id - ID of the cluster where the hit belongs.
location - Printing location of the issue.
url - URL to view the original pictures in Kansalliskirjasto's digi archives.
title - Title of the issue.
text - The actual text of the hit.
label - Whether the issue is a newspaper or journal.
Available fields when searching for clusters:start_label - Label of the first hit in the cluster.
start_location - Location of the first hit.
start_language - Language of the first hit.
avglength - Average length of the hits in the cluster.
gap - Maximum gap in years where no reprints were detected.
occyear - Year of the first hit.
cluster_id - ID of the cluster.
count - Number of hits in the cluster.
span - The span between first and last hit in the cluster.
virality_score - Virality score of the cluster. More information about this can be found on the about page.