Rodrigo Rosenfeld Rosas
What did I learn about Code Writing?
I’ve being coding for about 2 decades now. And still I don’t find it to be an exact science, as some would like to suppose. Otherwise, they wouldn’t ask you for time estimates on feature requests or try to use tools like MS Project to manage a software project, as if Gantt charts could be useful for this kind of project.
Of course, I can completely understand the reasons for the ones willing to bring such project management tools to the software world. Good luck to them! But I won’t talk about this subject in this article as it is too big. I would just like to state that software can be better understood when compared to sciences like Music or general Arts.
Both require lots of experience, personal feelings and are hard to estimate conclusion times since it is almost always something completely new. Although there are some recipes for certain kinds of music or movies, but then they are no longer art.
Some time ago I was asked to estimate how long it would take for me to implement a search system over some HTML documents taken from EDGAR filings. I’m pretty sure that this wouldn’t be something new for some of you who have already had experience with search engines before, but that wasn’t my case definitely. So, I knew I should research about tools like Lucene for search indexing, but I have never worked with them before. So how could I estimate this?
As I started following the tutorials, I thought the main problem was solved in the first 2 days, but I couldn’t predict that I would spend so much time reading about the configuration files for Solr, and how search params could be adjusted. There is a lot of stuff to know about and configure for your needs.
Particularly, one of the curiosities I’ve noticed is that even if my configuration was set to enable AND-like search for all typed terms, if it happens for a user to prepend some word with a plus (“+”) or minus (“-”), then non-prepended words would become optional. I had enabled the DisMax mode, by the way.
The challenge
So, I’d like to talk specifically about this specific challenge as it is a good example for demonstrating some techniques I’ve learned last year after reading Clean Code. Although being very Java-oriented, this book has a few simple rules that can be applied to every language and be really effective. Just like in Music and Movie Making, Software Writing is also a science in which there are lots of resources to learn from and that can be used in a systematic way. Learning those tools and techniques will help developers to deliver more in less time.
Developers should invest time on well-written code because they’ll spend most of their time reading code. So, it makes sense to invest time and money on tools that will make it easier to browse some code as well as investing some time polishing their code so that they become more readable too.
Before talking about those simple rules, I’d like to show you how I might write this code in my early times. Don’t waste your time trying to understand this code. Then, I’ll show you the code that I’ve actually written in a couple of hours, exactly as I have estimated before, since it didn’t have any external dependencies. So, basically, this is the trend:
Transform terms like ‘some +required -not-allowed “any phrase” id:(10 or 20 or 30)’ into ‘+some +required -not-allowed +“any phrase” +id:(10 or 20 or 30)’.
Pretty simple, right? But even software like this can be bug-prone. So, here is a poor implementation (in Groovy, as I’m a Grails programmer in my current job). Don’t try to really understand it (more on this later), just take a look at the code (dis)organization. I didn’t even try to compile it.
How not to code
1 | class SolrService { |
2 | ... |
3 | private String processQuery(String query) { |
4 | query = query.replaceAll('#', '') |
5 | def expressions = [], matches |
6 | while (matches = query =~ /\([^\(]*?\)/) { |
7 | matches.each { match -> |
8 | expressions << match |
9 | query = query.replace(match, "#{${expressions.size()}}".toString()) |
10 | } |
11 | } |
12 | (query =~ /\".*?\"/).each { match -> |
13 | expressions << match |
14 | query = query.replace(match, "#{${expressions.size()}}".toString()) |
15 | } |
16 | query = query.split(' ').findAll{it}.collect { word -> |
17 | word[0] in ['-', '+'] ? word : "+${word}" |
18 | }.join(' ') |
19 | def s = expressions.size() |
20 | expressions.reverse().eachWithIndex { expression, i -> |
21 | query = query.replace("#{${s - i}}", expression) |
22 | } |
23 | } |
24 | |
25 | def search(query) { |
26 | query = processQuery(query) |
27 | ... |
28 | return solrServer.request(new SolrQuery(query)) |
29 | } |
30 | } |
Ok, I’ll agree that for this specific case, the code may be not that bad, but although processQuery is not that big, you’ll need some time for figuring it out what is happened if you’re required to modify this method.
Also, looking at it, could you be sure it will work for all cases? Or could you tell me what is the reason for some specific line? What is this code protected from? How comfortable would you be if you were to modify this code? How would you write automated tests for processQuery?
Also, as the logic gets more complex, coding this way could led to some messy code like the one I’ve just taken from a file in the project that integrates Hibernate to Grails:
1 | // grails-core/grails-hibernate/src/main/groovy/grails/orm/HibernateCriteriaBuilder.java |
2 | // ... |
3 | @SuppressWarnings("rawtypes") |
4 | @Override |
5 | public Object invokeMethod(String name, Object obj) { |
6 | Object[] args = obj.getClass().isArray() ? (Object[])obj : new Object[]{obj}; |
7 | |
8 | if (paginationEnabledList && SET_RESULT_TRANSFORMER_CALL.equals(name) && args.length == 1 && |
9 | args[0] instanceof ResultTransformer) { |
10 | resultTransformer = (ResultTransformer) args[0]; |
11 | return null; |
12 | } |
13 | |
14 | if (isCriteriaConstructionMethod(name, args)) { |
15 | if (criteria != null) { |
16 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here")); |
17 | } |
18 | |
19 | if (name.equals(GET_CALL)) { |
20 | uniqueResult = true; |
21 | } |
22 | else if (name.equals(SCROLL_CALL)) { |
23 | scroll = true; |
24 | } |
25 | else if (name.equals(COUNT_CALL)) { |
26 | count = true; |
27 | } |
28 | else if (name.equals(LIST_DISTINCT_CALL)) { |
29 | resultTransformer = CriteriaSpecification.DISTINCT_ROOT_ENTITY; |
30 | } |
31 | |
32 | createCriteriaInstance(); |
33 | |
34 | // Check for pagination params |
35 | if (name.equals(LIST_CALL) && args.length == 2) { |
36 | paginationEnabledList = true; |
37 | orderEntries = new ArrayList<Order>(); |
38 | invokeClosureNode(args[1]); |
39 | } |
40 | else { |
41 | invokeClosureNode(args[0]); |
42 | } |
43 | |
44 | if (resultTransformer != null) { |
45 | criteria.setResultTransformer(resultTransformer); |
46 | } |
47 | Object result; |
48 | if (!uniqueResult) { |
49 | if (scroll) { |
50 | result = criteria.scroll(); |
51 | } |
52 | else if (count) { |
53 | criteria.setProjection(Projections.rowCount()); |
54 | result = criteria.uniqueResult(); |
55 | } |
56 | else if (paginationEnabledList) { |
57 | // Calculate how many results there are in total. This has been |
58 | // moved to before the 'list()' invocation to avoid any "ORDER |
59 | // BY" clause added by 'populateArgumentsForCriteria()', otherwise |
60 | // an exception is thrown for non-string sort fields (GRAILS-2690). |
61 | criteria.setFirstResult(0); |
62 | criteria.setMaxResults(Integer.MAX_VALUE); |
63 | |
64 | // Restore the previous projection, add settings for the pagination parameters, |
65 | // and then execute the query. |
66 | if (projectionList != null && projectionList.getLength() > 0) { |
67 | criteria.setProjection(projectionList); |
68 | } else { |
69 | criteria.setProjection(null); |
70 | } |
71 | for (Order orderEntry : orderEntries) { |
72 | criteria.addOrder(orderEntry); |
73 | } |
74 | if (resultTransformer == null) { |
75 | criteria.setResultTransformer(CriteriaSpecification.ROOT_ENTITY); |
76 | } |
77 | else if (paginationEnabledList) { |
78 | // relevant to GRAILS-5692 |
79 | criteria.setResultTransformer(resultTransformer); |
80 | } |
81 | // GRAILS-7324 look if we already have association to sort by |
82 | Map argMap = (Map)args[0]; |
83 | final String sort = (String) argMap.get(GrailsHibernateUtil.ARGUMENT_SORT); |
84 | if (sort != null) { |
85 | boolean ignoreCase = true; |
86 | Object caseArg = argMap.get(GrailsHibernateUtil.ARGUMENT_IGNORE_CASE); |
87 | if (caseArg instanceof Boolean) { |
88 | ignoreCase = (Boolean) caseArg; |
89 | } |
90 | final String orderParam = (String) argMap.get(GrailsHibernateUtil.ARGUMENT_ORDER); |
91 | final String order = GrailsHibernateUtil.ORDER_DESC.equalsIgnoreCase(orderParam) ? |
92 | GrailsHibernateUtil.ORDER_DESC : GrailsHibernateUtil.ORDER_ASC; |
93 | int lastPropertyPos = sort.lastIndexOf('.'); |
94 | String associationForOrdering = lastPropertyPos >= 0 ? sort.substring(0, lastPropertyPos) : null; |
95 | if (associationForOrdering != null && aliasMap.containsKey(associationForOrdering)) { |
96 | addOrder(criteria, aliasMap.get(associationForOrdering) + "." + sort.substring(lastPropertyPos + 1), |
97 | order, ignoreCase); |
98 | // remove sort from arguments map to exclude from default processing. |
99 | @SuppressWarnings("unchecked") Map argMap2 = new HashMap(argMap); |
100 | argMap2.remove(GrailsHibernateUtil.ARGUMENT_SORT); |
101 | argMap = argMap2; |
102 | } |
103 | } |
104 | GrailsHibernateUtil.populateArgumentsForCriteria(grailsApplication, targetClass, criteria, argMap); |
105 | GrailsHibernateTemplate ght = new GrailsHibernateTemplate(sessionFactory, grailsApplication); |
106 | PagedResultList pagedRes = new PagedResultList(ght, criteria); |
107 | result = pagedRes; |
108 | } |
109 | else { |
110 | result = criteria.list(); |
111 | } |
112 | } |
113 | else { |
114 | result = GrailsHibernateUtil.unwrapIfProxy(criteria.uniqueResult()); |
115 | } |
116 | if (!participate) { |
117 | hibernateSession.close(); |
118 | } |
119 | return result; |
120 | } |
121 | |
122 | if (criteria == null) createCriteriaInstance(); |
123 | |
124 | MetaMethod metaMethod = getMetaClass().getMetaMethod(name, args); |
125 | if (metaMethod != null) { |
126 | return metaMethod.invoke(this, args); |
127 | } |
128 | |
129 | metaMethod = criteriaMetaClass.getMetaMethod(name, args); |
130 | if (metaMethod != null) { |
131 | return metaMethod.invoke(criteria, args); |
132 | } |
133 | metaMethod = criteriaMetaClass.getMetaMethod(GrailsClassUtils.getSetterName(name), args); |
134 | if (metaMethod != null) { |
135 | return metaMethod.invoke(criteria, args); |
136 | } |
137 | |
138 | if (isAssociationQueryMethod(args) || isAssociationQueryWithJoinSpecificationMethod(args)) { |
139 | final boolean hasMoreThanOneArg = args.length > 1; |
140 | Object callable = hasMoreThanOneArg ? args[1] : args[0]; |
141 | int joinType = hasMoreThanOneArg ? (Integer)args[0] : CriteriaSpecification.INNER_JOIN; |
142 | |
143 | if (name.equals(AND) || name.equals(OR) || name.equals(NOT)) { |
144 | if (criteria == null) { |
145 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here")); |
146 | } |
147 | |
148 | logicalExpressionStack.add(new LogicalExpression(name)); |
149 | invokeClosureNode(callable); |
150 | |
151 | LogicalExpression logicalExpression = logicalExpressionStack.remove(logicalExpressionStack.size()-1); |
152 | addToCriteria(logicalExpression.toCriterion()); |
153 | |
154 | return name; |
155 | } |
156 | |
157 | if (name.equals(PROJECTIONS) && args.length == 1 && (args[0] instanceof Closure)) { |
158 | if (criteria == null) { |
159 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here")); |
160 | } |
161 | |
162 | projectionList = Projections.projectionList(); |
163 | invokeClosureNode(callable); |
164 | |
165 | if (projectionList != null && projectionList.getLength() > 0) { |
166 | criteria.setProjection(projectionList); |
167 | } |
168 | |
169 | return name; |
170 | } |
171 | |
172 | final PropertyDescriptor pd = BeanUtils.getPropertyDescriptor(targetClass, name); |
173 | if (pd != null && pd.getReadMethod() != null) { |
174 | ClassMetadata meta = sessionFactory.getClassMetadata(targetClass); |
175 | Type type = meta.getPropertyType(name); |
176 | if (type.isAssociationType()) { |
177 | String otherSideEntityName = |
178 | ((AssociationType) type).getAssociatedEntityName((SessionFactoryImplementor) sessionFactory); |
179 | Class oldTargetClass = targetClass; |
180 | targetClass = sessionFactory.getClassMetadata(otherSideEntityName).getMappedClass(EntityMode.POJO); |
181 | if (targetClass.equals(oldTargetClass) && !hasMoreThanOneArg) { |
182 | joinType = CriteriaSpecification.LEFT_JOIN; // default to left join if joining on the same table |
183 | } |
184 | associationStack.add(name); |
185 | final String associationPath = getAssociationPath(); |
186 | createAliasIfNeccessary(name, associationPath,joinType); |
187 | // the criteria within an association node are grouped with an implicit AND |
188 | logicalExpressionStack.add(new LogicalExpression(AND)); |
189 | invokeClosureNode(callable); |
190 | aliasStack.remove(aliasStack.size() - 1); |
191 | if (!aliasInstanceStack.isEmpty()) { |
192 | aliasInstanceStack.remove(aliasInstanceStack.size() - 1); |
193 | } |
194 | LogicalExpression logicalExpression = logicalExpressionStack.remove(logicalExpressionStack.size()-1); |
195 | if (!logicalExpression.args.isEmpty()) { |
196 | addToCriteria(logicalExpression.toCriterion()); |
197 | } |
198 | associationStack.remove(associationStack.size()-1); |
199 | targetClass = oldTargetClass; |
200 | |
201 | return name; |
202 | } |
203 | } |
204 | } |
205 | else if (args.length == 1 && args[0] != null) { |
206 | if (criteria == null) { |
207 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here")); |
208 | } |
209 | |
210 | Object value = args[0]; |
211 | Criterion c = null; |
212 | if (name.equals(ID_EQUALS)) { |
213 | return eq("id", value); |
214 | } |
215 | |
216 | if (name.equals(IS_NULL) || |
217 | name.equals(IS_NOT_NULL) || |
218 | name.equals(IS_EMPTY) || |
219 | name.equals(IS_NOT_EMPTY)) { |
220 | if (!(value instanceof String)) { |
221 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] with value [" + |
222 | value + "] requires a String value.")); |
223 | } |
224 | String propertyName = calculatePropertyName((String)value); |
225 | if (name.equals(IS_NULL)) { |
226 | c = Restrictions.isNull(propertyName); |
227 | } |
228 | else if (name.equals(IS_NOT_NULL)) { |
229 | c = Restrictions.isNotNull(propertyName); |
230 | } |
231 | else if (name.equals(IS_EMPTY)) { |
232 | c = Restrictions.isEmpty(propertyName); |
233 | } |
234 | else if (name.equals(IS_NOT_EMPTY)) { |
235 | c = Restrictions.isNotEmpty(propertyName); |
236 | } |
237 | } |
238 | |
239 | if (c != null) { |
240 | return addToCriteria(c); |
241 | } |
242 | } |
243 | |
244 | throw new MissingMethodException(name, getClass(), args); |
245 | } |
246 | // ... |
I do really hope never to have to understand such code… I’d be curious to find how would such an automated test be written for this invokeMethod, as I couldn’t find the tests in this project.
What is wrong with this code?
Back to the original implementation, what would be wrong with such code?
- It takes a lot of time for understand the code (the read cost);
- It is hard to test;
- Parsing the query is too much responsibility the SolrService class;
- If you just need some way of indexing and searching results in stored documents, you shouldn’t be relying in a specific solution, like Solr. If you decide later to change your Search solution from Solr to Elastic Search or using Lucene directly, you’ll need to change a lot of code. It would be better to have a wrapper for something simple like this;
- It can become hard to change/maintain/debug;
Even if you try to split processQuery into smaller methods, you would be required to pass some common values over and over again, like query and the expressions array, that would not only be an in-parameter but would be an out-parameter too as it would have to be changed inside some methods… When that happens, it is a hint that the overall code needs a separate class for doing the job. This is one of the simple rules I’ve learned in Clean Code.
The simple rules of Clean Code
The top-bottom code writing approach
While reading the original example, the first thing you’ll see is the processQuery method declared in the SolrService class. What does it do? Why do we need it? Who is using it? Only when we look forward, we’ll be able to detect that it is being used from the search method.
I was always used to write code that way, writing the least dependent methods first and the higher level ones as the latest ones. I guess I thought they should be declared first before they could be mentioned. Maybe that was true for some procedural languages I’ve started with before my first experience with OOP while reading a book about C++.
But in all OO languages I know about, it is ok to declare your methods in any order. Writing them top down makes it easier for another reader to understand your code because he/she will read your high-level instructions first.
Avoid more than 5 lines in a single method
Keeping your methods really small will make it easier to understand them and to write unit tests against them too. They’ll also be less error-prone.
Avoid methods with more than 2 or 3 parameters
Having lots of parameters in methods makes it really complicate to associate what is the meaning of each parameter. Looking at this code, could you understand what is the meaning of the last parameters?
1 | request.setAction(ACTION.POST, true, true, 10, false) |
You’d certainly have to checkout the API for AbstractUpdateRequest.
Avoid out-parameters
This is a typical example where you’d probably be better served by a separate class.
When you find out some situation where you’d like to return multiple values (I’m not talking about returning a single list, here) and you need some parameter for returning them (and out-parameter), you should reconsider if you’re taking the right path.
Also, you should really try to avoid modifying any parameter as debugging such code can be really frustrating.
Provide good names for your variables, classes and methods
This one is gold. Good names are essential for a good code reading experience. It can save you several hours trying to understand some snippet of code.
Take a look at the signature of the invokeMethod method in the Grails-Hibernate integration example code:
1 | Object invokeMethod(String name, Object obj) |
Wouldn’t it be easier to understand what it does if the signature was changed to this one?
1 | Object invokeMethodWith(String methodName, Object methodArguments) |
2 | |
3 | // code would look like (just supposing, I'm not sure): |
4 | criteria.invokeMethodWith("eq", [attributeName, expectedValue]) |
What does “obj” mean in the actual implementation? It could be anything with such generic description. Investing some time choosing good names for your methods and variables can save a lot of time from others trying to understand what the code does.
The result of applying such simple rules
Just by making use of those simple rules, you’ll be able to:
- Easily read your code;
- Easily write your unit tests;
- Easily modify and evolve your logic;
- Have a well-documented code;
- Get rid of some otherwise hard-to-find bugs.
Some extra rules
Some rules I’ve being using for my entire life and I’m not sure if they are all documented in the Clean Code book or not. But I’d like to talk a bit about them too.
Don’t use deep-depth blocks for handling validation rules
I’ve seen pseudo-code like this, so many times:
1 | declare square_root(number) { |
2 | if (number >= 0) { |
3 | do_real_calculations_with(number) |
4 | } |
5 | } |
Often, there are even more validation rules inside each block and this style gets really hard to read. And, worse than that, it is only protecting the software from crashing or generating an unexpected exception, but it does not properly handle bad inputs (negative numbers).
Also, usually do_real_calculations_with(number) is written as pages of code in a way you won’t be able to see the enclosing brackets of the block in a single page. Take a look again at the Hibernate-Grails integration code to see if you can easily find out where the block beginning at “if (isCriteriaConstructionMethod(name, args)) {” ends.
Even when you don’t have to do anything if the necessary conditions are not met, I’d rather code this way:
1 | declare square_root(number) { |
2 | if (number < 0) return // or raise "Taking the square root of negative numbers is not supported by this implementation" |
3 | do_real_calculations_with(number) |
4 | } |
This is a real example found in PersistentManagerBase.java from the Tomcat project.
1 | protected void processMaxIdleSwaps() { |
2 | |
3 | if (!getState().isAvailable() || maxIdleSwap < 0) |
4 | return; |
5 | |
6 | Session sessions[] = findSessions(); |
7 | long timeNow = System.currentTimeMillis(); |
8 | |
9 | // Swap out all sessions idle longer than maxIdleSwap |
10 | if (maxIdleSwap >= 0) { |
11 | for (int i = 0; i < sessions.length; i++) { |
12 | StandardSession session = (StandardSession) sessions[i]; |
13 | synchronized (session) { |
14 | if (!session.isValid()) |
15 | continue; |
16 | int timeIdle = // Truncate, do not round up |
17 | (int) ((timeNow - session.getThisAccessedTime()) / 1000L); |
18 | if (timeIdle > maxIdleSwap && timeIdle > minIdleSwap) { |
19 | if (session.accessCount != null && |
20 | session.accessCount.get() > 0) { |
21 | // Session is currently being accessed - skip it |
22 | continue; |
23 | } |
24 | if (log.isDebugEnabled()) |
25 | log.debug(sm.getString |
26 | ("persistentManager.swapMaxIdle", |
27 | session.getIdInternal(), |
28 | Integer.valueOf(timeIdle))); |
29 | try { |
30 | swapOut(session); |
31 | } catch (IOException e) { |
32 | // This is logged in writeSession() |
33 | } |
34 | } |
35 | } |
36 | } |
37 | } |
38 | } |
It is hard to see what bracket is closing which bracket in the end… This could be rewritten as:
1 | ... |
2 | if (maxIdleSwap < 0) return; |
3 | for (int i = 0; i < sessions.length; i++) { |
4 | ... |
5 | if (timeIdle <= maxIdleSwap || timeIdle < minIdleSwap) continue; |
6 | if (session.accessCount != null && session.accessCount.get() > 0) continue; |
7 | ... |
Simple code should be handled first
The pattern is:
1 | if some_condition |
2 | lots of lines of complex code handling here |
3 | else |
4 | simple handling for the case where some_condition is false |
Here is a concrete example taken from ActiveRecord::Explain.
1 | def logging_query_plan # :nodoc: |
2 | threshold = auto_explain_threshold_in_seconds |
3 | current = Thread.current |
4 | if threshold && current[:available_queries_for_explain].nil? |
5 | begin |
6 | queries = current[:available_queries_for_explain] = [] |
7 | start = Time.now |
8 | result = yield |
9 | logger.warn(exec_explain(queries)) if Time.now - start > threshold |
10 | result |
11 | ensure |
12 | current[:available_queries_for_explain] = nil |
13 | end |
14 | else |
15 | yield |
16 | end |
17 | end |
I would rather write such code as:
1 | def logging_query_plan # :nodoc: |
2 | threshold = auto_explain_threshold_in_seconds |
3 | current = Thread.current |
4 | return yield unless threshold && current[:available_queries_for_explain].nil? |
5 | queries = current[:available_queries_for_explain] = [] |
6 | start = Time.now |
7 | result = yield |
8 | logger.warn(exec_explain(queries)) if Time.now - start > threshold |
9 | result |
10 | ensure |
11 | current[:available_queries_for_explain] = nil |
12 | end |
Of course, this isn’t exactly the same as the original code in the case yield generates some exception for the “else” code, but I’m sure this could be worked around.
Don’t handle separate exceptions when you don’t need to
I’ve often found this pattern while reading Java code and I believe that is the result of using some Java IDE. The IDE will tell the developer that some exceptions were not handled and will automatically fill the code as:
1 | void myMethod() throws MyOwnException { |
2 | try { |
3 | someMethod() |
4 | } |
5 | catch(FileNotFoundException ex) { |
6 | throw MyOwnException("File was not found") |
7 | } |
8 | catch(WrongPermissionException ex) { |
9 | throw MyOwnException("You don't have the right permission to write to the file") |
10 | } |
11 | catch(CorruptFileException ex) { |
12 | throw MyOwnException("The file is corrupted") |
13 | } |
14 | ... |
15 | } |
If you’re only interested in gracefully handle exceptions to give your user a better feedback, why doesn’t you just write this instead:
1 | void myMethod() throws MyOwnException { |
2 | try { |
3 | someMethod() |
4 | } catch(Exception ex) { |
5 | log.error("Couldn't perform XYZ action", ex) |
6 | throw new MyOwnException("Sorry, couldn't perform XYZ action. Please contact our support team and we'll investigate this issue.") |
7 | } |
8 | } |
The original challenge actual implementation
And, finally, following those techniques, here is how I actually coded that original challenge and implemented the tests in JUnit:
1 | class SearchService { |
2 | ... |
3 | def search(query) { |
4 | query = new QueryProcessor(query).processedQuery |
5 | ... |
6 | new SearchResult(solrServer.request(new SolrQuery(query))) |
7 | } |
8 | } |
I’ll omit the implementation of SearchResult class, as it is irrelevant to this specific challenge. I just want to point out that I’ve abstracted the search feature in some wrapper classes for not exposing Solr internals.
And here is the real implementation code:
1 | package myappname.search |
2 | |
3 | /* Solr behaves in an uncommon way: |
4 | Even when configured for making an "AND" search, when a signal (+ or -) |
5 | is prepended to any word, the ones that are not prepended are considered optionals. |
6 | We don't want that, so we're prefixing all terms with a "+" unless they're already |
7 | prefixed. |
8 | */ |
9 | class QueryProcessor { |
10 | private query, expressions = [], words = [] |
11 | |
12 | QueryProcessor(query) { this.query = query } |
13 | |
14 | def getProcessedQuery() { |
15 | removeHashesFromQuery() |
16 | extractParenthesis() |
17 | extractQuotedText() |
18 | splitWords() |
19 | addPlusSignToUnsignedWords() |
20 | joinProcessedWords() |
21 | replaceExpressions() |
22 | query |
23 | } |
24 | |
25 | private removeHashesFromQuery() { query = query.replaceAll('#', '') } |
26 | |
27 | private extractParenthesis() { |
28 | def matches = query =~ /\([^\(]*?\)/ |
29 | if (!matches) return |
30 | replaceMatches(matches) |
31 | // keep trying in case of nested parenthesis |
32 | extractParenthesis() |
33 | } |
34 | |
35 | private replaceMatches(matches) { |
36 | matches.each { |
37 | expressions << it |
38 | query = query.replace(it, "#{${expressions.size()}}".toString()) |
39 | } |
40 | } |
41 | |
42 | private extractQuotedText() { |
43 | replaceMatches(query =~ /\".*?\"/) |
44 | } |
45 | |
46 | private splitWords() { |
47 | words = query.split(' ').findAll{it} |
48 | } |
49 | |
50 | private addPlusSignToUnsignedWords() { |
51 | words = words.collect { word -> |
52 | word[0] in ['-', '+'] ? word : "+${word}" |
53 | } |
54 | } |
55 | |
56 | private joinProcessedWords() { query = words.join(' ') } |
57 | |
58 | private replaceExpressions() { |
59 | def s = expressions.size() |
60 | expressions.reverse().eachWithIndex { expression, i -> |
61 | query = query.replace("#{${s - i}}", expression) |
62 | } |
63 | } |
64 | } |
And the unit tests:
1 | package myappname.search |
2 | |
3 | import org.junit.* |
4 | |
5 | class QueryProcessorTests { |
6 | @Test |
7 | void removeHashesFromQuery() { |
8 | def p = new QueryProcessor('some#hashes # in # query') |
9 | p.removeHashesFromQuery() |
10 | assert p.query == 'somehashes in query' |
11 | } |
12 | |
13 | @Test |
14 | void extractParenthesis() { |
15 | def p = new QueryProcessor('(abc (cde fgh)) no parenthesis transaction_id:(ijk) (lmn)') |
16 | p.extractParenthesis() |
17 | assert p.query == '#{4} no parenthesis transaction_id:#{2} #{3}' |
18 | assert p.expressions == ['(cde fgh)', '(ijk)', '(lmn)', '(abc #{1})'] |
19 | } |
20 | |
21 | @Test |
22 | void extractQuotedText() { |
23 | def p = new QueryProcessor('some "quoted" text and "some more"') |
24 | p.extractQuotedText() |
25 | assert p.query == 'some #{1} text and #{2}' |
26 | assert p.expressions == ['"quoted"', '"some more"'] |
27 | } |
28 | |
29 | @Test |
30 | void splitWords() { |
31 | def p = new QueryProcessor('some #{1} text and id:#{2} ') |
32 | p.splitWords() |
33 | assert p.words == ['some', '#{1}', 'text', 'and', 'id:#{2}'] |
34 | } |
35 | |
36 | @Test |
37 | void addPlusSignToUnsignedWords() { |
38 | def p = new QueryProcessor('some #{1} -text and id:#{2} +text ') |
39 | p.splitWords() |
40 | p.addPlusSignToUnsignedWords() |
41 | assert p.words == ['+some', '+#{1}', '-text', '+and', '+id:#{2}', '+text'] |
42 | } |
43 | |
44 | @Test |
45 | void joinProcessedWords() { |
46 | def p = new QueryProcessor('') |
47 | p.words = ['+some', '-minus', '+#{1}'] |
48 | p.joinProcessedWords() |
49 | assert p.query == "+some -minus +#{1}" |
50 | } |
51 | |
52 | @Test |
53 | void replaceExpressions() { |
54 | def p = new QueryProcessor('+#{1} -minus +transaction_id:#{2}') |
55 | p.expressions = ['first', '(23 or 98)'] |
56 | p.replaceExpressions() |
57 | assert p.query == '+first -minus +transaction_id:(23 or 98)' |
58 | } |
59 | |
60 | @Test |
61 | void processedQuery() { |
62 | def p = new QueryProcessor('coca-cola -pepsi transaction_id:(34 or 76)') |
63 | assert p.processedQuery == '+coca-cola -pepsi +transaction_id:(34 or 76)' |
64 | } |
65 | } |
Conclusion
That is it. I’d like you to share your opinions on other techniques I may have not talked about here. Are there any improvements that you think would make this code even easier to understand? I’d really appreciate any other considerations you might have since I’m always very interested in writing Clean Code.